WorldWideScience

Sample records for large text document

  1. ParaText : scalable solutions for processing and searching very large document collections : final LDRD report.

    Energy Technology Data Exchange (ETDEWEB)

    Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.; Shead, Timothy M.

    2010-09-01

    This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.

  2. Documents and legal texts

    International Nuclear Information System (INIS)

    2017-01-01

    This section treats of the following documents and legal texts: 1 - Belgium 29 June 2014 - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy; 2 - Belgium, 7 December 2016. - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy

  3. Documents and legal texts

    International Nuclear Information System (INIS)

    2016-01-01

    This section treats of the following documents and legal texts: 1 - Brazil: Law No. 13,260 of 16 March 2016 (To regulate the provisions of item XLIII of Article 5 of the Federal Constitution on terrorism, dealing with investigative and procedural provisions and redefining the concept of a terrorist organisation; and amends Laws No. 7,960 of 21 December 1989 and No. 12,850 of 2 August 2013); 2 - India: The Atomic Energy (Amendment) Act, 2015; Department Of Atomic Energy Notification (Civil Liability for Nuclear Damage); 3 - Japan: Act on Subsidisation, etc. for Nuclear Damage Compensation Funds following the implementation of the Convention on Supplementary Compensation for Nuclear Damage

  4. Documents and legal texts

    International Nuclear Information System (INIS)

    2013-01-01

    This section reprints a selection of recently published legislative texts and documents: - Russian Federation: Federal Law No.170 of 21 November 1995 on the use of atomic energy, Adopted by the State Duma on 20 October 1995; - Uruguay: Law No.19.056 On the Radiological Protection and Safety of Persons, Property and the Environment (4 January 2013); - Japan: Third Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (concerning Damages related to Rumour-Related Damage in the Agriculture, Forestry, Fishery and Food Industries), 30 January 2013; - France and the United States: Joint Statement on Liability for Nuclear Damage (Aug 2013); - Franco-Russian Nuclear Power Declaration (1 November 2013)

  5. Documents and legal texts

    International Nuclear Information System (INIS)

    2015-01-01

    This section treats of the following Documents and legal texts: 1 - Canada: Nuclear Liability and Compensation Act (An Act respecting civil liability and compensation for damage in case of a nuclear incident, repealing the Nuclear Liability Act and making consequential amendments to other acts); 2 - Japan: Act on Compensation for Nuclear Damage (The purpose of this act is to protect persons suffering from nuclear damage and to contribute to the sound development of the nuclear industry by establishing a basic system regarding compensation in case of nuclear damage caused by reactor operation etc.); Act on Indemnity Agreements for Compensation of Nuclear Damage; 3 - Slovak Republic: Act on Civil Liability for Nuclear Damage and on its Financial Coverage and on Changes and Amendments to Certain Laws (This Act regulates: a) The civil liability for nuclear damage incurred in the causation of a nuclear incident, b) The scope of powers of the Nuclear Regulatory Authority (hereinafter only as the 'Authority') in relation to the application of this Act, c) The competence of the National Bank of Slovakia in relation to the supervised financial market entities in the financial coverage of liability for nuclear damage; and d) The penalties for violation of this Act)

  6. Documents and legal texts

    International Nuclear Information System (INIS)

    2014-01-01

    This section of the Bulletin presents the recently published documents and legal texts sorted by country: - Brazil: Resolution No. 169 of 30 April 2014. - Japan: Act Concerning Exceptions to Interruption of Prescription Pertaining to Use of Settlement Mediation Procedures by the Dispute Reconciliation Committee for Nuclear Damage Compensation in relation to Nuclear Damage Compensation Disputes Pertaining to the Great East Japan Earthquake (Act No. 32 of 5 June 2013); Act Concerning Measures to Achieve Prompt and Assured Compensation for Nuclear Damage Arising from the Nuclear Plant Accident following the Great East Japan Earthquake and Exceptions to the Extinctive Prescription, etc. of the Right to Claim Compensation for Nuclear Damage (Act No. 97 of 11 December 2013); Fourth Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage Resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.); Outline of 'Fourth Supplement to Interim Guidelines (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.)'. - OECD Nuclear Energy Agency: Decision and Recommendation of the Steering Committee Concerning the Application of the Paris Convention to Nuclear Installations in the Process of Being Decommissioned; Joint Declaration on the Security of Supply of Medical Radioisotopes. - United Arab Emirates: Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage; Ratification of the Federal Supreme Council of Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage

  7. Text document classification

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana

    č. 62 (2005), s. 53-54 ISSN 0926-4981 R&D Projects: GA AV ČR IAA2075302; GA AV ČR KSK1019101; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : document representation * categorization * classification Subject RIV: BD - Theory of Information

  8. Text segmentation in degraded historical document images

    Directory of Open Access Journals (Sweden)

    A.S. Kavitha

    2016-07-01

    Full Text Available Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.

  9. Text document classification based on mixture models

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Malík, Antonín

    2004-01-01

    Roč. 40, č. 3 (2004), s. 293-304 ISSN 0023-5954 R&D Projects: GA AV ČR IAA2075302; GA ČR GA102/03/0049; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : text classification * text categorization * multinomial mixture model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

  10. Semantic Document Image Classification Based on Valuable Text Pattern

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2011-01-01

    Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.

  11. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    Science.gov (United States)

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  12. Classification process in a text document recommender system

    Directory of Open Access Journals (Sweden)

    Dan MUNTEANU

    2005-12-01

    Full Text Available This paper presents the classification process in a recommender system used for textual documents taken especially from web. The system uses in the classification process a combination of content filters, event filters and collaborative filters and it uses implicit and explicit feedback for evaluating documents.

  13. A Typed Text Retrieval Query Language for XML Documents.

    Science.gov (United States)

    Colazzo, Dario; Sartiani, Carlo; Albano, Antonio; Manghi, Paolo; Ghelli, Giorgio; Lini, Luca; Paoli, Michele

    2002-01-01

    Discussion of XML focuses on a description of Tequyla-TX, a typed text retrieval query language for XML documents that can search on both content and structures. Highlights include motivations; numerous examples; word-based and char-based searches; tag-dependent full-text searches; text normalization; query algebra; data models and term language;…

  14. Machine printed text and handwriting identification in noisy document images.

    Science.gov (United States)

    Zheng, Yefeng; Li, Huiping; Doermann, David

    2004-03-01

    In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.

  15. Script-independent text line segmentation in freestyle handwritten documents.

    Science.gov (United States)

    Li, Yi; Zheng, Yefeng; Doermann, David; Jaeger, Stefan; Li, Yi

    2008-08-01

    Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.

  16. Visualizing the semantic content of large text databases using text maps

    Science.gov (United States)

    Combs, Nathan

    1993-01-01

    A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.

  17. Text mining in the classification of digital documents

    Directory of Open Access Journals (Sweden)

    Marcial Contreras Barrera

    2016-11-01

    Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.

  18. Finding Text Information in the Ocean of Electronic Documents

    Energy Technology Data Exchange (ETDEWEB)

    Medvick, Patricia A.; Calapristi, Augustin J.

    2003-02-05

    Information management in natural resources has become an overwhelming task. A massive amount of electronic documents and data is now available for creating informed decisions. The problem is finding the relevant information to support the decision-making process. Determining gaps in knowledge in order to propose new studies or to determine which proposals to fund for maximum potential is a time-consuming and difficult task. Additionally, available data stores are increasing in complexity; they now may include not only text and numerical data, but also images, sounds, and video recordings. Information visualization specialists at Pacific Northwest National Laboratory (PNNL) have software tools for exploring electronic data stores and for discovering and exploiting relationships within data sets. These provide capabilities for unstructured text explorations, the use of data signatures (a compact format for the essence of a set of scientific data) for visualization (Wong et al 2000), visualizations for multiple query results (Havre et al. 2001), and others (http://www.pnl.gov/infoviz ). We will focus on IN-SPIRE, a MS Windows vision of PNNL’s SPIRE (Spatial Paradigm for Information Retrieval and Exploration). IN-SPIRE was developed to assist information analysts find and discover information in huge masses of text documents.

  19. Leveraging Text Content for Management of Construction Project Documents

    Science.gov (United States)

    Alqady, Mohammed

    2012-01-01

    The construction industry is a knowledge intensive industry. Thousands of documents are generated by construction projects. Documents, as information carriers, must be managed effectively to ensure successful project management. The fact that a single project can produce thousands of documents and that a lot of the documents are generated in a…

  20. Classification of protein-protein interaction full-text documents using text and citation network features.

    Science.gov (United States)

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

  1. A document preparation system in a large network environment

    Energy Technology Data Exchange (ETDEWEB)

    Vigil, M.; Bouchier, S.; Sanders, C.; Sydoriak, S.; Wheeler, K.

    1988-01-01

    At Los Alamos National Laboratory, we have developed an integrated document preparation system that produces publication-quality documents. This system combines text formatters and computer graphics capabilities that have been adapted to meet the needs of users in a large scientific research laboratory. This paper describes the integration of document processing technology to develop a system architecture, based on a page description language, to provide network-wide capabilities in a distributed computing environment. We describe the Laboratory requirements, the integration and implementation issues, and the challenges we faced developing this system.

  2. RECOVERY OF DOCUMENT TEXT FROM TORN FRAGMENTS USING IMAGE PROCESSING

    OpenAIRE

    C.Prasad; Dr.Mahesh; Dr.S.A.K. Jilani

    2016-01-01

    Recovery of document from its torn or damaged fragments play an important role in the field of forensics and archival study. Reconstruction of the torn papers manually with the help of glue and tapes etc., is tedious, time consuming and not satisfactory. For torn images reconstruction we go for image mosaicing, where we reconstruct the image using features (corners) and RANSAC with homography.But for the torn fragments there is no such similarity portion between fragments. Hence we propose a ...

  3. Information Gain Based Dimensionality Selection for Classifying Text Documents

    Energy Technology Data Exchange (ETDEWEB)

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.

  4. Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization

    Science.gov (United States)

    Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.

    2017-01-01

    Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.

  5. "What is relevant in a text document?": An interpretable machine learning approach.

    Directory of Open Access Journals (Sweden)

    Leila Arras

    Full Text Available Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP, a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

  6. Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

    Science.gov (United States)

    Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael

    2015-01-01

    We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

  7. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques

    International Nuclear Information System (INIS)

    Braga, Fabiane dos Reis

    2013-01-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  8. Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

    OpenAIRE

    R, Amarnath; Nagabhushan, P.

    2017-01-01

    Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burde...

  9. Text extraction method for historical Tibetan document images based on block projections

    Science.gov (United States)

    Duan, Li-juan; Zhang, Xi-qun; Ma, Long-long; Wu, Jian

    2017-11-01

    Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.

  10. The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

    Science.gov (United States)

    Gunawan, D.; Sembiring, C. A.; Budiman, M. A.

    2018-03-01

    Rapidly increasing number of web pages or documents leads to topic specific filtering in order to find web pages or documents efficiently. This is a preliminary research that uses cosine similarity to implement text relevance in order to find topic specific document. This research is divided into three parts. The first part is text-preprocessing. In this part, the punctuation in a document will be removed, then convert the document to lower case, implement stop word removal and then extracting the root word by using Porter Stemming algorithm. The second part is keywords weighting. Keyword weighting will be used by the next part, the text relevance calculation. Text relevance calculation will result the value between 0 and 1. The closer value to 1, then both documents are more related, vice versa.

  11. Invariant practical tasks for work with text documents at the secondary school

    Directory of Open Access Journals (Sweden)

    Л И Карташова

    2013-12-01

    Full Text Available In article examples of practical tasks on creation, editing and formatting of text documents focused on pupils of the secondary school are given. Tasks have invariant character and don't depend on concrete software.

  12. Means of storage and automated monitoring of versions of text technical documentation

    Science.gov (United States)

    Leonovets, S. A.; Shukalov, A. V.; Zharinov, I. O.

    2018-03-01

    The paper presents automation of the process of preparation, storage and monitoring of version control of a text designer, and program documentation by means of the specialized software is considered. Automation of preparation of documentation is based on processing of the engineering data which are contained in the specifications and technical documentation or in the specification. Data handling assumes existence of strictly structured electronic documents prepared in widespread formats according to templates on the basis of industry standards and generation by an automated method of the program or designer text document. Further life cycle of the document and engineering data entering it are controlled. At each stage of life cycle, archive data storage is carried out. Studies of high-speed performance of use of different widespread document formats in case of automated monitoring and storage are given. The new developed software and the work benches available to the developer of the instrumental equipment are described.

  13. SYSTEM «PlagiarismControl» AS THE TOOL FOR THE EXPERTISE OF THE TEXT DOCUMENTS

    Directory of Open Access Journals (Sweden)

    Yu. B. Krapivin

    2018-01-01

    Full Text Available The description and the operability analysis of the implemented instrumental software system «PlagiarismControl» has been done. The system affords to automatize solving the task of the identification of the adopted fragments in the given text document both from the local full-text user’s database and from the Internet. The system affords solving the task taking in account explicit as well as implicit adoptions with precision up to lexical units paradigms and both lexical and grammatical synonymy relations, according to the structural-functional schematic diagram of the system of the automatic recognition of reproduced fragments of the text documents. «PlagiarismControl» is able to work in different modes, to automatize the work of the expert and to speed up significantly the procedure of the analysis of the documents, with the purpose of recognition of the adoptions (plagiarism from other text documents.

  14. LOG2MARKUP: State module to transform a Stata text log into a markup document

    DEFF Research Database (Denmark)

    2016-01-01

    log2markup extract parts of the text version from the Stata log command and transform the logfile into a markup based document with the same name, but with extension markup (or otherwise specified in option extension) instead of log. The author usually uses markdown for writing documents. However...

  15. Fast words boundaries localization in text fields for low quality document images

    Science.gov (United States)

    Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry

    2018-04-01

    The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3

  16. Ultrasound-guided nerve blocks--is documentation and education feasible using only text and pictures?

    Directory of Open Access Journals (Sweden)

    Bjarne Skjødt Worm

    Full Text Available PURPOSE: With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings. MATERIALS AND METHODS: We undertook a clinical study to quantify the maximal gross visibility or ultrastructure of seven peripheral nerves identified by either real time ultrasound (clinical cohort, n = 635 or by still picture ultrasonograms (clinical cohort, n = 112. In addition, we undertook a study on test subjects (n = 4 to quantify interobserver variations and potential bias among expert and trainee observers. RESULTS: When comparing real time ultrasound and interpretation of still picture sonograms, gross identification of large nerves was reduced by 15% and 40% by expert and trainee observers, respectively, while gross identification of small nerves was reduced by 29% and 66%. Identification of within-nerve ultrastructure was even less. For all nerve sizes, trainees were unable to identify any anatomical structure in 24 to 34%, while experts were unable to identify anything in 9 to 10%. CONCLUSION: Exhaustive ultrasonography experience and real time ultrasound measurements seem to be keystones in obtaining optimal nerve identification. In contrast the use of still pictures appears to be insufficient for documentation as well as educational purposes. Alternatives such as video clips or enhanced picture technology are encouraged

  17. Polish Phoneme Statistics Obtained On Large Set Of Written Texts

    Directory of Open Access Journals (Sweden)

    Bartosz Ziółko

    2009-01-01

    Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.

  18. Chyawanprash: A review of therapeutic benefits as in authoritative texts and documented clinical literature.

    Science.gov (United States)

    Narayana, D B Anantha; Durg, Sharanbasappa; Manohar, P Ram; Mahapatra, Anita; Aramya, A R

    2017-02-02

    Chyawanprash (CP), a traditional immune booster recipe, has a long history of ethnic origin, development, household preparation and usage. There are even mythological stories about the origin of this recipe including its nomenclature. In the last six decades, CP, because of entrepreneurial actions of some research Vaidyas (traditional doctors) has grown to industrial production and marketing in packed forms to a large number of consumers/patients like any food or health care product. Currently, CP has acquired a large accepted user base in India and in a few countries out-side India. Authoritative texts, recognized by the Drugs and Cosmetics Act of India, describe CP as an immunity enhancer and strength giver meant for improving lung functions in diseases with compromised immunity. This review focuses on published clinical efficacy and safety studies of CP for correlation with health benefits as documented in the authoritative texts, and also briefs on its recipes and processes. Authoritative texts were searched for recipes, processes, and other technical details of CP. Labels of marketing CP products (Indian) were studied for the health claims. Electronic search for studies of CP on efficacy and safety data were performed in PubMed/MEDLINE and DHARA (Digital Helpline for Ayurveda Research Articles), and Ayurvedic books were also searched for clinical studies. The documented clinical studies from electronic databases and Ayurvedic books evidenced that individuals who consume CP regularly for a definite period of time showed improvement in overall health status and immunity. However, most of the clinical studies in this review are of smaller sample size and short duration. Further, limitation to access and review significant data on traditional products like CP in electronic databases was noted. Randomized controlled trials of high quality with larger sample size and longer follow-up are needed to have significant evidence on the clinical use of CP as immunity

  19. Documentation is Documentation and Theory is Theory: A Reply to Daniel Avorgbedor's Commentary "Documenting Spoken and Sung Texts of the Dagaaba of West Africa"

    Directory of Open Access Journals (Sweden)

    Manolete Mora

    2007-11-01

    Full Text Available In a response to an article that appeared in Empirical Musicology Review (Bodomo and Mora 2007, Avorgbedor (2007 takes issue with aspects of the paper. In our reply to Avorgbedor’s response we will firstly clarify some issues raised therein and secondly address the issue about the relationship between theory, description and documentation within linguistics and musicology.

  20. MeSH: a window into full text for document summarization.

    Science.gov (United States)

    Bhattacharya, Sanmitra; Ha-Thuc, Viet; Srinivasan, Padmini

    2011-07-01

    Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu.

  1. Use of speech-to-text technology for documentation by healthcare providers.

    Science.gov (United States)

    Ajami, Sima

    2016-01-01

    Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.

  2. Software Manages Documentation in a Large Test Facility

    Science.gov (United States)

    Gurneck, Joseph M.

    2001-01-01

    The 3MCS computer program assists and instrumentation engineer in performing the 3 essential functions of design, documentation, and configuration management of measurement and control systems in a large test facility. Services provided by 3MCS are acceptance of input from multiple engineers and technicians working at multiple locations;standardization of drawings;automated cross-referencing; identification of errors;listing of components and resources; downloading of test settings; and provision of information to customers.

  3. The Analysis of Heterogeneous Text Documents with the Help of the Computer Program NUD*IST

    Directory of Open Access Journals (Sweden)

    Christine Plaß

    2000-12-01

    Full Text Available On the basis of a current research project we discuss the use of the computer program NUD*IST for the analysis and archiving of qualitative documents. Our project examines the social evaluation of spectacular criminal offenses and we identify, digitize and analyze documents from the entire 20th century. Since public and scientific discourses are examined, the data of the project are extraordinarily heterogeneous: scientific publications, court records, newspaper reports, and administrative documents. We want to show how to transfer general questions into a systematic categorization with the assistance of NUD*IST. Apart from the functions, possibilities and limitations of the application of NUD*IST, concrete work procedures and difficulties encountered are described. URN: urn:nbn:de:0114-fqs0003211

  4. Using machine learning to disentangle homonyms in large text corpora.

    Science.gov (United States)

    Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

    2018-06-01

    Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.

  5. Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform

    Directory of Open Access Journals (Sweden)

    Abdelghani Souhar

    2017-12-01

    Full Text Available A crucial task in character recognition systems is the segmentation of the document into text lines and especially if it is handwritten. When dealing with non-Latin document such as Arabic, the challenge becomes greater since in addition to the variability of writing, the presence of diacritical points and the high number of ascender and descender characters complicates more the process of the segmentation. To remedy with this complexity and even to make this difficulty an advantage since the focus is on the Arabic language which is semi-cursive in nature, a method based on the Watershed Transform technique is proposed. Tested on «Handwritten Arabic Proximity Datasets» a segmentation rate of 93% for a 95% of matching score is achieved.

  6. Large Hospital 50% Energy Savings: Technical Support Document

    Energy Technology Data Exchange (ETDEWEB)

    Bonnema, E.; Studer, D.; Parker, A.; Pless, S.; Torcellini, P.

    2010-09-01

    This Technical Support Document documents the technical analysis and design guidance for large hospitals to achieve whole-building energy savings of at least 50% over ANSI/ASHRAE/IESNA Standard 90.1-2004 and represents a step toward determining how to provide design guidance for aggressive energy savings targets. This report documents the modeling methods used to demonstrate that the design recommendations meet or exceed the 50% goal. EnergyPlus was used to model the predicted energy performance of the baseline and low-energy buildings to verify that 50% energy savings are achievable. Percent energy savings are based on a nominal minimally code-compliant building and whole-building, net site energy use intensity. The report defines architectural-program characteristics for typical large hospitals, thereby defining a prototype model; creates baseline energy models for each climate zone that are elaborations of the prototype models and are minimally compliant with Standard 90.1-2004; creates a list of energy design measures that can be applied to the prototype model to create low-energy models; uses industry feedback to strengthen inputs for baseline energy models and energy design measures; and simulates low-energy models for each climate zone to show that when the energy design measures are applied to the prototype model, 50% energy savings (or more) are achieved.

  7. Ultrasound-guided nerve blocks - is documentation and education feasible using only text and pictures?

    DEFF Research Database (Denmark)

    Worm, Bjarne Skjødt; Krag, Mette; Jensen, Kenneth

    2014-01-01

    With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about...... the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees...... and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings....

  8. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

    Directory of Open Access Journals (Sweden)

    Hamish Cunningham

    Full Text Available This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

  9. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

    Science.gov (United States)

    Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina

    2013-01-01

    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

  10. Literary Hermeneutic - A Large Vision upon the Text

    Directory of Open Access Journals (Sweden)

    Elena Vorotneac

    2011-12-01

    Full Text Available This article represents the book “Literary Hermeneutic” by Victoria Fonari, Ph.D., State University of Moldova. Hermeneutic, as a researching object, includes literary, critical, theological, juridical, linguistic, psychological, verbal and sociological knowledge. Literary Hermeneutic is one of the most favored disciplines. It is venerated both in Homeric exegesis from antiquity and in the improvement of the methodology interpretation of the canonical works, in which a vain moment is texts’ deciphering – the monuments and authors’ comment from times immemorial, thus re-establishing a part of human values. The re-establishing of the connections between the values of the past and their understanding from the present prospect is due to literary interpretation. The demands of the paradigm of the literary and artistic interpretation, constitutes a basic element which is important both for the writing of academic researches and for the literary values of understanding. It directs the student to scientific works and facilitated the professional activity of teachers, journalists, jurists and translators.

  11. Content analysis to detect high stress in oral interviews and text documents

    Science.gov (United States)

    Thirumalainambi, Rajkumar (Inventor); Jorgensen, Charles C. (Inventor)

    2012-01-01

    A system of interrogation to estimate whether a subject of interrogation is likely experiencing high stress, emotional volatility and/or internal conflict in the subject's responses to an interviewer's questions. The system applies one or more of four procedures, a first statistical analysis, a second statistical analysis, a third analysis and a heat map analysis, to identify one or more documents containing the subject's responses for which further examination is recommended. Words in the documents are characterized in terms of dimensions representing different classes of emotions and states of mind, in which the subject's responses that manifest high stress, emotional volatility and/or internal conflict are identified. A heat map visually displays the dimensions manifested by the subject's responses in different colors, textures, geometric shapes or other visually distinguishable indicia.

  12. A Comparative Analysis of Information Hiding Techniques for Copyright Protection of Text Documents

    Directory of Open Access Journals (Sweden)

    Milad Taleby Ahvanooey

    2018-01-01

    Full Text Available With the ceaseless usage of web and other online services, it has turned out that copying, sharing, and transmitting digital media over the Internet are amazingly simple. Since the text is one of the main available data sources and most widely used digital media on the Internet, the significant part of websites, books, articles, daily papers, and so on is just the plain text. Therefore, copyrights protection of plain texts is still a remaining issue that must be improved in order to provide proof of ownership and obtain the desired accuracy. During the last decade, digital watermarking and steganography techniques have been used as alternatives to prevent tampering, distortion, and media forgery and also to protect both copyright and authentication. This paper presents a comparative analysis of information hiding techniques, especially on those ones which are focused on modifying the structure and content of digital texts. Herein, various text watermarking and text steganography techniques characteristics are highlighted along with their applications. In addition, various types of attacks are described and their effects are analyzed in order to highlight the advantages and weaknesses of current techniques. Finally, some guidelines and directions are suggested for future works.

  13. A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories

    Science.gov (United States)

    Nguyen, Hung D.; Steele, Gynelle C.

    2016-01-01

    This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.

  14. Documenting Instructional Practices in Large Introductory STEM Lecture Courses

    Science.gov (United States)

    Vu, Viet Quoc

    STEM education reform in higher education is framed around the need to improve student learning outcomes, increase student retention, and increase the number of underrepresented minorities and female students in STEM fields, all of which would ultimately contribute to America's competitiveness and prosperity. To achieve these goals, education reformers call for an increase in the adoption of research-based "promising practices" in classrooms. Despite efforts to increase the adoption of more promising practices in classrooms, postsecondary instructors are still likely to lecture and use traditional teaching approaches. To shed light on this adoption dilemma, a mix-methods study was conducted. First, instructional practices in large introductory STEM courses were identified, followed by an analysis of factors that inhibit or contribute to the use of promising practices. Data were obtained from classroom observations (N = 259) of large gateway courses across STEM departments and from instructor interviews (N = 67). Results show that instructors are already aware of promising practices and that change strategies could move from focusing on the development and dissemination of promising practices to focusing on improving adoption rates. Teaching-track instructors such as lecturers with potential for security of employment (LPSOE) and lecturers with security of employment (LSOE) have adopted promising practices more than other instructors. Interview data show that LPSOEs are also effective at disseminating promising practices to their peers, but opinion leaders (influential faculty in a department) are necessary to promote adoption of promising practices by higher ranking instructors. However, hiring more LPSOEs or opinion leaders will not be enough to shift instructional practices. Variations in the adoption of promising practices by instructors and across departments show that any reform strategy needs to be systematic and take into consideration how information is

  15. A Text Mining Approach for Extracting Lessons Learned from Project Documentation: An Illustrative Case Study

    Directory of Open Access Journals (Sweden)

    Benjamin Matthies

    2017-12-01

    Full Text Available Lessons learned are important building blocks for continuous learning in project-based organisations. Nonetheless, the practical reality is that lessons learned are often not consistently reused for organisational learning. Two problems are commonly described in this context: the information overload and the lack of procedures and methods for the assessment and implementation of lessons learned. This paper addresses these problems, and appropriate solutions are combined in a systematic lesson learned process. Latent Dirichlet Allocation is presented to solve the first problem. Regarding the second problem, established risk management methods are adapted. The entire lessons learned process will be demonstrated in a practical case study

  16. Large Scale Document Inversion using a Multi-threaded Computing System

    Science.gov (United States)

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2018-01-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  17. Large Scale Document Inversion using a Multi-threaded Computing System.

    Science.gov (United States)

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2017-06-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  18. On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency.

    Science.gov (United States)

    Ellis, David; And Others

    1994-01-01

    Describes a study in which several different sets of hypertext links are inserted by different people in full-text documents. The degree of similarity between the sets is measured using coefficients and topological indices. As in comparable studies of inter-indexer consistency, the sets of links used by different people showed little similarity.…

  19. Automating the generation of lexical patterns for processing free text in clinical documents.

    Science.gov (United States)

    Meng, Frank; Morioka, Craig

    2015-09-01

    Many tasks in natural language processing utilize lexical pattern-matching techniques, including information extraction (IE), negation identification, and syntactic parsing. However, it is generally difficult to derive patterns that achieve acceptable levels of recall while also remaining highly precise. We present a multiple sequence alignment (MSA)-based technique that automatically generates patterns, thereby leveraging language usage to determine the context of words that influence a given target. MSAs capture the commonalities among word sequences and are able to reveal areas of linguistic stability and variation. In this way, MSAs provide a systemic approach to generating lexical patterns that are generalizable, which will both increase recall levels and maintain high levels of precision. The MSA-generated patterns exhibited consistent F1-, F.5-, and F2- scores compared to two baseline techniques for IE across four different tasks. Both baseline techniques performed well for some tasks and less well for others, but MSA was found to consistently perform at a high level for all four tasks. The performance of MSA on the four extraction tasks indicates the method's versatility. The results show that the MSA-based patterns are able to handle the extraction of individual data elements as well as relations between two concepts without the need for large amounts of manual intervention. We presented an MSA-based framework for generating lexical patterns that showed consistently high levels of both performance and recall over four different extraction tasks when compared to baseline methods. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document

    Directory of Open Access Journals (Sweden)

    Nurdiansyah Yanuar

    2018-01-01

    Full Text Available Plagiarism occurs when the students have tasks and pursued by the deadline. Plagiarism is considered as the fastest way to accomplish the tasks. This reason makes the author tried to build a plagiarism detection system with Winnowing algorithm as document similarity search algorithm. The documents that being tested are Indonesian journals with extension .doc, .docx, and/or .txt. Similarity calculation process through two stages, the first is the process of making a document fingerprint using Winnowing algorithm and the second is using Jaccard coefficient similarity. In order to develop this system, the author used iterative waterfall model approach. The main objective of this project is to determine the level of plagiarism. It is expected to prevent plagiarism either intentionally or unintentionally before our journal published by displaying the percentage of similarity in the journals that we make.

  1. BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

    Science.gov (United States)

    Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M

    2013-01-01

    De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.

  2. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    Science.gov (United States)

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).

  3. Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text.

    Directory of Open Access Journals (Sweden)

    Arwa Bin Raies

    Full Text Available BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download.

  4. Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

    KAUST Repository

    Bin Raies, Arwa

    2013-10-16

    Background:In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.Methodology:We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text.Conclusion:The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. © 2013 Bin Raies et al.

  5. Large-Scale Multi-Dimensional Document Clustering on GPU Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

    2010-01-01

    Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.

  6. Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

    Directory of Open Access Journals (Sweden)

    Lev Guzmán-Vargas

    2015-11-01

    Full Text Available We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org using the natural visibility graph method (NVG. NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales and γ l ≈ 1 . 3 (at large degree scales. This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.

  7. FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

    Science.gov (United States)

    Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

    2016-10-01

    Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets ( e.g. , application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.

  8. Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

    Science.gov (United States)

    Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

    2011-01-01

    Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.

  9. Large-scale extraction of gene interactions from full-text literature using DeepDive.

    Science.gov (United States)

    Mallory, Emily K; Zhang, Ce; Ré, Christopher; Altman, Russ B

    2016-01-01

    A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app russ.altman@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  10. Automatic extraction of property norm-like data from large text corpora.

    Science.gov (United States)

    Kelly, Colin; Devereux, Barry; Korhonen, Anna

    2014-01-01

    Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.

  11. Technical Support Document: Development of the Advanced Energy Design Guide for Large Hospitals - 50% Energy Savings

    Energy Technology Data Exchange (ETDEWEB)

    Bonnema, E.; Leach, M.; Pless, S.

    2013-06-01

    This Technical Support Document describes the process and methodology for the development of the Advanced Energy Design Guide for Large Hospitals: Achieving 50% Energy Savings Toward a Net Zero Energy Building (AEDG-LH) ASHRAE et al. (2011b). The AEDG-LH is intended to provide recommendations for achieving 50% whole-building energy savings in large hospitals over levels achieved by following Standard 90.1-2004. The AEDG-LH was created for a 'standard' mid- to large-size hospital, typically at least 100,000 ft2, but the strategies apply to all sizes and classifications of new construction hospital buildings. Its primary focus is new construction, but recommendations may be applicable to facilities undergoing total renovation, and in part to many other hospital renovation, addition, remodeling, and modernization projects (including changes to one or more systems in existing buildings).

  12. Documentation of body mass index and control of associated risk factors in a large primary care network

    Directory of Open Access Journals (Sweden)

    Grant Richard W

    2009-12-01

    Full Text Available Abstract Background Body mass index (BMI will be a reportable health measure in the United States (US through implementation of Healthcare Effectiveness Data and Information Set (HEDIS guidelines. We evaluated current documentation of BMI, and documentation and control of associated risk factors by BMI category, based on electronic health records from a 12-clinic primary care network. Methods We conducted a cross-sectional analysis of 79,947 active network patients greater than 18 years of age seen between 7/05 - 12/06. We defined BMI category as normal weight (NW, 18-24.9 kg/m2, overweight (OW, 25-29.9, and obese (OB, ≥ 30. We measured documentation (yes/no and control (above/below of the following three risk factors: blood pressure (BP ≤130/≤85 mmHg, low-density lipoprotein (LDL ≤130 mg/dL (3.367 mmol/L, and fasting glucose Results BMI was documented in 48,376 patients (61%, range 34-94%, distributed as 30% OB, 34% OW, and 36% NW. Documentation of all three risk factors was higher in obesity (OB = 58%, OW = 54%, NW = 41%, p for trend Conclusions In a large primary care network BMI documentation has been incomplete and for patients with BMI measured, risk factor control has been poorer in obese patients compared with NW, even in those with obesity and CVD or diabetes. Better knowledge of BMI could provide an opportunity for improved quality in obesity care.

  13. Representation of Social History Factors Across Age Groups: A Topic Analysis of Free-Text Social Documentation.

    Science.gov (United States)

    Lindemann, Elizabeth A; Chen, Elizabeth S; Wang, Yan; Skube, Steven J; Melton, Genevieve B

    2017-01-01

    As individuals age, there is potential for dramatic changes in the social and behavioral determinants that affect health status and outcomes. The importance of these determinants has been increasingly recognized in clinical decision-making. We sought to characterize how social and behavioral health determinants vary in different demographic groups using a previously established schema of 28 social history types through both manual analysis and automated topic analysis of social documentation in the electronic health record across the population of an entire integrated healthcare system. Our manual analysis generated 8,335 annotations over 1,400 documents, representing 24 (86%) social history types. In contrast, automated topic analysis generated 22 (79%) social history types. A comparative evaluation demonstrated both similarities and differences in coverage between the manual and topic analyses. Our findings validate the widespread nature of social and behavioral determinants that affect health status over populations of individuals over their lifespan.

  14. The BioLexicon: a large-scale terminological resource for biomedical text mining

    Directory of Open Access Journals (Sweden)

    Thompson Paul

    2011-10-01

    Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is

  15. The BioLexicon: a large-scale terminological resource for biomedical text mining

    Science.gov (United States)

    2011-01-01

    Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical

  16. Is there still an unknown Freud? A note on the publications of Freud's texts and on unpublished documents.

    Science.gov (United States)

    Falzeder, Ernst

    2007-01-01

    This article presents an overview of the existing editions of what Freud wrote (works, letters, manuscripts and drafts, diaries and calendar notes, dedications and margin notes in books, case notes, and patient calendars) and what he is recorded as having said (minutes of meetings, interviews, memoirs of and interviews with patients, family members, and followers, and other quotes). There follows a short overview of biographies of Freud and other documentation on his life. It is concluded that a wealth of material is now available to Freud scholars, although more often than not this information is used in a biased and partisan way.

  17. Technical Support Document: Strategies for 50% Energy Savings in Large Office Buildings

    Energy Technology Data Exchange (ETDEWEB)

    Leach, M.; Lobato, C.; Hirsch, A.; Pless, S.; Torcellini, P.

    2010-09-01

    This Technical Support Document (TSD) documents technical analysis that informs design guidance for designing and constructing large office buildings that achieve 50% net site energy savings over baseline buildings defined by minimal compliance with respect to ANSI/ASHRAE/IESNA Standard 90.1-2004. This report also represents a step toward developing a methodology for using energy modeling in the design process to achieve aggressive energy savings targets. This report documents the modeling and analysis methods used to identify design recommendations for six climate zones that capture the range of U.S. climate variability; demonstrates how energy savings change between ASHRAE Standard 90.1-2007 and Standard 90.1-2004 to determine baseline energy use; uses a four-story 'low-rise' prototype to analyze the effect of building aspect ratio on energy use intensity; explores comparisons between baseline and low-energy building energy use for alternate energy metrics (net source energy, energy emissions, and energy cost); and examines the extent to which glass curtain construction limits achieve energy savings by using a 12-story 'high-rise' prototype.

  18. Menzerath-Altmann law for distinct word distribution analysis in a large text

    Science.gov (United States)

    Eroglu, Sertac

    2013-06-01

    The empirical law uncovered by Menzerath and formulated by Altmann, known as the Menzerath-Altmann law (henceforth the MA law), reveals the statistical distribution behavior of human language in various organizational levels. Building on previous studies relating organizational regularities in a language, we propose that the distribution of distinct (or different) words in a large text can effectively be described by the MA law. The validity of the proposition is demonstrated by examining two text corpora written in different languages not belonging to the same language family (English and Turkish). The results show not only that distinct word distribution behavior can accurately be predicted by the MA law, but that this result appears to be language-independent. This result is important not only for quantitative linguistic studies, but also may have significance for other naturally occurring organizations that display analogous organizational behavior. We also deliberately demonstrate that the MA law is a special case of the probability function of the generalized gamma distribution.

  19. Drug allergies documented in electronic health records of a large healthcare system.

    Science.gov (United States)

    Zhou, L; Dhopeshwarkar, N; Blumenthal, K G; Goss, F; Topaz, M; Slight, S P; Bates, D W

    2016-09-01

    The prevalence of drug allergies documented in electronic health records (EHRs) of large patient populations is understudied. We aimed to describe the prevalence of common drug allergies and patient characteristics documented in EHRs of a large healthcare network over the last two decades. Drug allergy data were obtained from EHRs of patients who visited two large tertiary care hospitals in Boston from 1990 to 2013. The prevalence of each drug and drug class was calculated and compared by sex and race/ethnicity. The number of allergies per patient was calculated and the frequency of patients having 1, 2, 3…, or 10+ drug allergies was reported. We also conducted a trend analysis by comparing the proportion of each allergy to the total number of drug allergies over time. Among 1 766 328 patients, 35.5% of patients had at least one reported drug allergy with an average of 1.95 drug allergies per patient. The most commonly reported drug allergies in this population were to penicillins (12.8%), sulfonamide antibiotics (7.4%), opiates (6.8%), and nonsteroidal anti-inflammatory drugs (NSAIDs) (3.5%). The relative proportion of allergies to angiotensin-converting enzyme (ACE) inhibitors and HMG CoA reductase inhibitors (statins) have more than doubled since early 2000s. Drug allergies were most prevalent among females and white patients except for NSAIDs, ACE inhibitors, and thiazide diuretics, which were more prevalent in black patients. Females and white patients may be more likely to experience a reaction from common medications. An increase in reported allergies to ACE inhibitors and statins is noteworthy. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  20. Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

    KAUST Repository

    Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B.

    2013-01-01

    implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.Methodology:We developed a novel text mining methodology based on a new concept of position weight matrices

  1. Large mandibular central odontogenic fibroma documented over 20 years: A case report

    Directory of Open Access Journals (Sweden)

    Patrick Bandura

    Full Text Available Introduction: Central odontogenic fibroma (COF is a rare, benign, slow-growing intraosseous odontogenic tumor, and accounts for 0.1% of all odontogenic tumors. It is often confused with other entities, such as keratocysts, ameloblastomas, and odontogenic myxomas. Complete enucleation followed by curettage is the treatment of choice for COF to ensure the lowest possible chance of recurrence. Case presentation: We report the case of a young Caucasian woman with COF that went undiagnosed for several years despite repeated radiologic examinations. Finally, a massive tumor was surgically removed and the wound was curetted. The specimen was histologically confirmed to be a COF. The patient remains under regular follow-up, and thus far there have been no clinical or radiologic signs of recurrence. Discussion: This rare case of COF, which was documented over a period of 20 years, has helped us to describe the features of this tumor. It also confirms that adequate surgical treatment can lead to impressive bone regeneration in healthy individuals, as evident from the radiologic findings acquired before, during, and after enucleation of the COF in our patient. Our findings also confirm the view that COF has a favorable prognosis regardless of its final size. Conclusion: Early diagnosis is key to successful treatment of COF. The slow but steady increase in the size of a COF with no accompanying symptoms has not been reported previously. To our knowledge, this is the only documented case of a COF that has been under continuous radiologic observation for over 20 years. Keywords: Case report, Central odontogenic fibroma, Long-term, Bone deformation, Follow-up, Tumor enucleation

  2. Quantitative analysis of large amounts of journalistic texts using topic modelling

    NARCIS (Netherlands)

    Jacobi, C.; van Atteveldt, W.H.; Welbers, K.

    2016-01-01

    The huge collections of news content which have become available through digital technologies both enable and warrant scientific inquiry, challenging journalism scholars to analyse unprecedented amounts of texts. We propose Latent Dirichlet Allocation (LDA) topic modelling as a tool to face this

  3. Fast and Effective Approximations for Summarization and Categorization of Very Large Text Corpora

    OpenAIRE

    Godbehere, Andrew B.

    2015-01-01

    Given the overwhelming quantities of data generated every day, there is a pressing need for tools that can extract valuable and timely information. Vast reams of text data are now published daily, containing information of interest to those in social science, marketing, finance, and public policy, to name a few. Consider the case of the micro-blogging website Twitter, which in May 2013 was estimated to contain 58 million messages per day: in a single day, Twitter generates a greater volume of...

  4. A scalable machine-learning approach to recognize chemical names within large text databases

    Directory of Open Access Journals (Sweden)

    Wren Jonathan D

    2006-09-01

    Full Text Available Abstract Motivation The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. Results A first-order Markov Model (MM was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid.

  5. Browsing, Discovery and Search in Large Distributed Databases of Complex and Scanned Documents

    National Research Council Canada - National Science Library

    Croft, W

    1998-01-01

    .... These techniques should be particularly relevant to the patent domain where it is important to find relationships between documents and where the patent or trademark may be based on a visual design...

  6. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques; Metodologia para extracao semiautomatica de uma taxonomia de conceitos a partir da producao cientifica da area nuclear utilizando tecnicas de mineracao de textos

    Energy Technology Data Exchange (ETDEWEB)

    Braga, Fabiane dos Reis

    2013-07-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  7. The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.

    Science.gov (United States)

    Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H

    2015-12-01

    Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.

  8. Identification document for large by-catch species in the Mauritanian Exclusive Economic Zone

    NARCIS (Netherlands)

    Hofstede, ter R.

    2003-01-01

    The coastal waters of Mauritania (Northwest Africa) contain large numbers of pelagic fish such as sardinella, sardine, mackerel and horse mackerel. Freezer-trawlers from the European Union (EU), mainly of Dutch origin, have exploited these resources since 1996. In 1998 the Dutch ship owners

  9. Large bedrock slope failures in a British Columbia, Canada fjord: first documented submarine sackungen

    Science.gov (United States)

    Conway, Kim W.; Vaughn Barrie, J.

    2018-01-01

    Very large (>60×106 m3) sackungen or deep-seated gravitational slope deformations occur below sea level along a steep fjord wall in central Douglas Channel, British Columbia. The massive bedrock blocks were mobile between 13 and 11.5 thousand radiocarbon years BP (15,800 and 13,400 BP) immediately following deglaciation. Deformation of fjord sediments is apparent in sedimentary units overlying and adjacent to the blocks. Faults bound the edges of each block, cutting the glacial section but not the Holocene sediments. Retrogressive slides, small inset landslides as well as incipient and older slides are found on and around the large failure blocks. Lineations, fractures and faults parallel the coastline of Douglas Channel along the shoreline of the study area. Topographic data onshore indicate that faults and joints demarcate discrete rhomboid-shaped blocks which controlled the form, size and location of the sackungen. The described submarine sackungen share characteristic geomorphic features with many montane occurrences, such as uphill-facing scarps, foliated bedrock composition, largely vertical dislocation and a deglacial timing of development.

  10. Constructing Model of Relationship among Behaviors and Injuries to Products Based on Large Scale Text Data on Injuries

    Science.gov (United States)

    Nomori, Koji; Kitamura, Koji; Motomura, Yoichi; Nishida, Yoshifumi; Yamanaka, Tatsuhiro; Komatsubara, Akinori

    In Japan, childhood injury prevention is urgent issue. Safety measures through creating knowledge of injury data are essential for preventing childhood injuries. Especially the injury prevention approach by product modification is very important. The risk assessment is one of the most fundamental methods to design safety products. The conventional risk assessment has been carried out subjectively because product makers have poor data on injuries. This paper deals with evidence-based risk assessment, in which artificial intelligence technologies are strongly needed. This paper describes a new method of foreseeing usage of products, which is the first step of the evidence-based risk assessment, and presents a retrieval system of injury data. The system enables a product designer to foresee how children use a product and which types of injuries occur due to the product in daily environment. The developed system consists of large scale injury data, text mining technology and probabilistic modeling technology. Large scale text data on childhood injuries was collected from medical institutions by an injury surveillance system. Types of behaviors to a product were derived from the injury text data using text mining technology. The relationship among products, types of behaviors, types of injuries and characteristics of children was modeled by Bayesian Network. The fundamental functions of the developed system and examples of new findings obtained by the system are reported in this paper.

  11. Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus

    Directory of Open Access Journals (Sweden)

    Nikhil Kalathil

    2018-01-01

    Full Text Available When assessing the importance of materials (or other components to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries.

  12. Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.

    Science.gov (United States)

    Xu, Rong; Wang, QuanQiu

    2015-06-01

    Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Comparing the hierarchy of author given tags and repository given tags in a large document archive

    Science.gov (United States)

    Tibély, Gergely; Pollner, Péter; Palla, Gergely

    2016-10-01

    Folksonomies - large databases arising from collaborative tagging of items by independent users - are becoming an increasingly important way of categorizing information. In these systems users can tag items with free words, resulting in a tripartite item-tag-user network. Although there are no prescribed relations between tags, the way users think about the different categories presumably has some built in hierarchy, in which more special concepts are descendants of some more general categories. Several applications would benefit from the knowledge of this hierarchy. Here we apply a recent method to check the differences and similarities of hierarchies resulting from tags given by independent individuals and from tags given by a centrally managed repository system. The results from our method showed substantial differences between the lower part of the hierarchies, and in contrast, a relatively high similarity at the top of the hierarchies.

  14. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

    Science.gov (United States)

    Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I

    2015-02-21

    Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a

  15. The LawsAndFamilies questionnaire on legal family formats for same-sex and/or different-sex couples : Text of the questions and of the accompanying guidance document.

    NARCIS (Netherlands)

    Waaldijk, C.; Lorenzo, Villaverde J.M.; Nikolina, N.; Zago, G.

    2016-01-01

    This Working Paper of the research project FamiliesAndSocieties contains the text of the LawsAndFamilies questionnaire, plus the text of the guidance document provided to legal experts answering this questionnaire. These texts are preceded by a brief introduction to the background, aims and

  16. Digitization of Full-Text Documents Before Publishing on the Internet: A Case Study Reviewing the Latest Optical Character Recognition Technologies.

    Science.gov (United States)

    McClean, Clare M.

    1998-01-01

    Reviews strengths and weaknesses of five optical character recognition (OCR) software packages used to digitize paper documents before publishing on the Internet. Outlines options available and stages of the conversion process. Describes the learning experience of Eurotext, a United Kingdom-based electronic libraries project (eLib). (PEN)

  17. Pinpointing Needles in Giant Haystacks: Use of Text Mining to Reduce Impractical Screening Workload in Extremely Large Scoping Reviews

    Science.gov (United States)

    Shemilt, Ian; Simon, Antonia; Hollands, Gareth J.; Marteau, Theresa M.; Ogilvie, David; O'Mara-Eves, Alison; Kelly, Michael P.; Thomas, James

    2014-01-01

    In scoping reviews, boundaries of relevant evidence may be initially fuzzy, with refined conceptual understanding of interventions and their proposed mechanisms of action an intended output of the scoping process rather than its starting point. Electronic searches are therefore sensitive, often retrieving very large record sets that are…

  18. GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

    Science.gov (United States)

    Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

    2013-01-28

    Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.

  19. Documents and legal texts: Australia, Germany, Sweden

    International Nuclear Information System (INIS)

    Anon.

    2012-01-01

    Australia: National Radioactive Waste Management Act 2012 No. 29, 2012 (An Act to make provision in relation to the selection of a site for, and the establishment and operation of, a radioactive waste management facility, and for related purposes). Germany: Act on the Peaceful Utilisation of Atomic Energy and the Protection against its Hazards (Atomic Energy Act) of 23 December 1959, as amended and promulgated on 15 July 1985, last amendment by the Act of 8 November 2011. Sweden: The Swedish Radiation Safety Authority's regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (Swedish Radiation Safety Authority Regulatory Code issued on 20 October 2011, Published on 2 November 2011); The Swedish Radiation Safety Authority's general advice on the application of the regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (issued on 20 October 2011)

  20. Using Text Documents from American Memory.

    Science.gov (United States)

    Singleton, Laurel R., Ed.

    2002-01-01

    This publication contains classroom-tested teaching ideas. For grades K-4, "'Blessed Ted-fred': Famous Fathers Write to Their Children" uses American Memory for primary source letters written by Theodore Roosevelt and Alexander Graham Bell to their children. For grades 5-8, "Found Poetry and the American Life Histories…

  1. Search for dark matter at [Formula: see text] in final states containing an energetic photon and large missing transverse momentum with the ATLAS detector.

    Science.gov (United States)

    Aaboud, M; Aad, G; Abbott, B; Abdallah, J; Abdinov, O; Abeloos, B; Abidi, S H; AbouZeid, O S; Abraham, N L; Abramowicz, H; Abreu, H; Abreu, R; Abulaiti, Y; Acharya, B S; Adachi, S; Adamczyk, L; Adelman, J; Adersberger, M; Adye, T; Affolder, A A; Agatonovic-Jovin, T; Agheorghiesei, C; Aguilar-Saavedra, J A; Ahlen, S P; Ahmadov, F; Aielli, G; Akatsuka, S; Akerstedt, H; Åkesson, T P A; Akimov, A V; Alberghi, G L; Albert, J; Albicocco, P; Alconada Verzini, M J; Aleksa, M; Aleksandrov, I N; Alexa, C; Alexander, G; Alexopoulos, T; Alhroob, M; Ali, B; Aliev, M; Alimonti, G; Alison, J; Alkire, S P; Allbrooke, B M M; Allen, B W; Allport, P P; Aloisio, A; Alonso, A; Alonso, F; Alpigiani, C; Alshehri, A A; Alstaty, M; Alvarez Gonzalez, B; Álvarez Piqueras, D; Alviggi, M G; Amadio, B T; Amaral Coutinho, Y; Amelung, C; Amidei, D; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amundsen, G; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, J K; Anderson, K J; Andreazza, A; Andrei, V; Angelidakis, S; Angelozzi, I; Angerami, A; Anisenkov, A V; Anjos, N; Annovi, A; Antel, C; Antonelli, M; Antonov, A; Antrim, D J; Anulli, F; Aoki, M; Aperio Bella, L; Arabidze, G; Arai, Y; Araque, J P; Araujo Ferraz, V; Arce, A T H; Ardell, R E; Arduh, F A; Arguin, J-F; Argyropoulos, S; Arik, M; Armbruster, A J; Armitage, L J; Arnaez, O; Arnold, H; Arratia, M; Arslan, O; Artamonov, A; Artoni, G; Artz, S; Asai, S; Asbah, N; Ashkenazi, A; Asquith, L; Assamagan, K; Astalos, R; Atkinson, M; Atlay, N B; Aubry, L; Augsten, K; Avolio, G; Axen, B; Ayoub, M K; Azuelos, G; Baas, A E; Baca, M J; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Bagnaia, P; Bahrasemani, H; Baines, J T; Bajic, M; Baker, O K; Baldin, E M; Balek, P; Balli, F; Balunas, W K; Banas, E; Banerjee, Sw; Bannoura, A A E; Barak, L; Barberio, E L; Barberis, D; Barbero, M; Barillari, T; Barisits, M-S; Barklow, T; Barlow, N; Barnes, S L; Barnett, B M; Barnett, R M; Barnovska-Blenessy, Z; Baroncelli, A; Barone, G; Barr, A J; Barranco Navarro, L; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartos, P; Basalaev, A; Bassalat, A; Bates, R L; Batista, S J; Batley, J R; Battaglia, M; Bauce, M; Bauer, F; Bawa, H S; Beacham, J B; Beattie, M D; Beau, T; Beauchemin, P H; Bechtle, P; Beck, H P; Becker, K; Becker, M; Beckingham, M; Becot, C; Beddall, A J; Beddall, A; Bednyakov, V A; Bedognetti, M; Bee, C P; Beermann, T A; Begalli, M; Begel, M; Behr, J K; Bell, A S; Bella, G; Bellagamba, L; Bellerive, A; Bellomo, M; Belotskiy, K; Beltramello, O; Belyaev, N L; Benary, O; Benchekroun, D; Bender, M; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez, J; Benjamin, D P; Benoit, M; Bensinger, J R; Bentvelsen, S; Beresford, L; Beretta, M; Berge, D; Bergeaas Kuutmann, E; Berger, N; Beringer, J; Berlendis, S; Bernard, N R; Bernardi, G; Bernius, C; Bernlochner, F U; Berry, T; Berta, P; Bertella, C; Bertoli, G; Bertolucci, F; Bertram, I A; Bertsche, C; Bertsche, D; Besjes, G J; Bessidskaia Bylund, O; Bessner, M; Besson, N; Betancourt, C; Bethani, A; Bethke, S; Bevan, A J; Beyer, J; Bianchi, R M; Biebel, O; Biedermann, D; Bielski, R; Biesuz, N V; Biglietti, M; Bilbao De Mendizabal, J; Billoud, T R V; Bilokon, H; Bindi, M; Bingul, A; Bini, C; Biondi, S; Bisanz, T; Bittrich, C; Bjergaard, D M; Black, C W; Black, J E; Black, K M; Blair, R E; Blazek, T; Bloch, I; Blocker, C; Blue, A; Blum, W; Blumenschein, U; Blunier, S; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Bock, C; Boehler, M; Boerner, D; Bogavac, D; Bogdanchikov, A G; Bohm, C; Boisvert, V; Bokan, P; Bold, T; Boldyrev, A S; Bolz, A E; Bomben, M; Bona, M; Boonekamp, M; Borisov, A; Borissov, G; Bortfeldt, J; Bortoletto, D; Bortolotto, V; Boscherini, D; Bosman, M; Bossio Sola, J D; Boudreau, J; Bouffard, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Boutle, S K; Boveia, A; Boyd, J; Boyko, I R; Bracinik, J; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Breaden Madden, W D; Brendlinger, K; Brennan, A J; Brenner, L; Brenner, R; Bressler, S; Briglin, D L; Bristow, T M; Britton, D; Britzger, D; Brochu, F M; Brock, I; Brock, R; Brooijmans, G; Brooks, T; Brooks, W K; Brosamer, J; Brost, E; Broughton, J H; Bruckman de Renstrom, P A; Bruncko, D; Bruni, A; Bruni, G; Bruni, L S; Brunt, B H; Bruschi, M; Bruscino, N; Bryant, P; Bryngemark, L; Buanes, T; Buat, Q; Buchholz, P; Buckley, A G; Budagov, I A; Buehrer, F; Bugge, M K; Bulekov, O; Bullock, D; Burch, T J; Burckhart, H; Burdin, S; Burgard, C D; Burger, A M; Burghgrave, B; Burka, K; Burke, S; Burmeister, I; Burr, J T P; Busato, E; Büscher, D; Büscher, V; Bussey, P; Butler, J M; Buttar, C M; Butterworth, J M; Butti, P; Buttinger, W; Buzatu, A; Buzykaev, A R; Cabrera Urbán, S; Caforio, D; Cairo, V M; Cakir, O; Calace, N; Calafiura, P; Calandri, A; Calderini, G; Calfayan, P; Callea, G; Caloba, L P; Calvente Lopez, S; Calvet, D; Calvet, S; Calvet, T P; Camacho Toro, R; Camarda, S; Camarri, P; Cameron, D; Caminal Armadans, R; Camincher, C; Campana, S; Campanelli, M; Camplani, A; Campoverde, A; Canale, V; Cano Bret, M; Cantero, J; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capua, M; Carbone, R M; Cardarelli, R; Cardillo, F; Carli, I; Carli, T; Carlino, G; Carlson, B T; Carminati, L; Carney, R M D; Caron, S; Carquin, E; Carrá, S; Carrillo-Montoya, G D; Carvalho, J; Casadei, D; Casado, M P; Casolino, M; Casper, D W; Castelijn, R; Castillo Gimenez, V; Castro, N F; Catinaccio, A; Catmore, J R; Cattai, A; Caudron, J; Cavaliere, V; Cavallaro, E; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Celebi, E; Ceradini, F; Cerda Alberich, L; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cervelli, A; Cetin, S A; Chafaq, A; Chakraborty, D; Chan, S K; Chan, W S; Chan, Y L; Chang, P; Chapman, J D; Charlton, D G; Chau, C C; Chavez Barajas, C A; Che, S; Cheatham, S; Chegwidden, A; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, S; Chen, S; Chen, X; Chen, Y; Cheng, H C; Cheng, H J; Cheplakov, A; Cheremushkina, E; Cherkaoui El Moursli, R; Chernyatin, V; Cheu, E; Chevalier, L; Chiarella, V; Chiarelli, G; Chiodini, G; Chisholm, A S; Chitan, A; Chiu, Y H; Chizhov, M V; Choi, K; Chomont, A R; Chouridou, S; Christodoulou, V; Chromek-Burckhart, D; Chu, M C; Chudoba, J; Chuinard, A J; Chwastowski, J J; Chytka, L; Ciftci, A K; Cinca, D; Cindro, V; Cioara, I A; Ciocca, C; Ciocio, A; Cirotto, F; Citron, Z H; Citterio, M; Ciubancan, M; Clark, A; Clark, B L; Clark, M R; Clark, P J; Clarke, R N; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Colasurdo, L; Cole, B; Colijn, A P; Collot, J; Colombo, T; Conde Muiño, P; Coniavitis, E; Connell, S H; Connelly, I A; Constantinescu, S; Conti, G; Conventi, F; Cooke, M; Cooper-Sarkar, A M; Cormier, F; Cormier, K J R; Corradi, M; Corriveau, F; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Cottin, G; Cowan, G; Cox, B E; Cranmer, K; Crawley, S J; Creager, R A; Cree, G; Crépé-Renaudin, S; Crescioli, F; Cribbs, W A; Cristinziani, M; Croft, V; Crosetti, G; Cueto, A; Cuhadar Donszelmann, T; Cukierman, A R; Cummings, J; Curatolo, M; Cúth, J; Czirr, H; Czodrowski, P; D'amen, G; D'Auria, S; D'eramo, L; D'Onofrio, M; Da Cunha Sargedas De Sousa, M J; Da Via, C; Dabrowski, W; Dado, T; Dai, T; Dale, O; Dallaire, F; Dallapiccola, C; Dam, M; Dandoy, J R; Daneri, M F; Dang, N P; Daniells, A C; Dann, N S; Danninger, M; Dano Hoffmann, M; Dao, V; Darbo, G; Darmora, S; Dassoulas, J; Dattagupta, A; Daubney, T; Davey, W; David, C; Davidek, T; Davies, M; Davis, D R; Davison, P; Dawe, E; Dawson, I; De, K; de Asmundis, R; De Benedetti, A; De Castro, S; De Cecco, S; De Groot, N; de Jong, P; De la Torre, H; De Lorenzi, F; De Maria, A; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Vasconcelos Corga, K; De Vivie De Regie, J B; Dearnaley, W J; Debbe, R; Debenedetti, C; Dedovich, D V; Dehghanian, N; Deigaard, I; Del Gaudio, M; Del Peso, J; Del Prete, T; Delgove, D; Deliot, F; Delitzsch, C M; Dell'Acqua, A; Dell'Asta, L; Dell'Orso, M; Della Pietra, M; Della Volpe, D; Delmastro, M; Delporte, C; Delsart, P A; DeMarco, D A; Demers, S; Demichev, M; Demilly, A; Denisov, S P; Denysiuk, D; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deterre, C; Dette, K; Devesa, M R; Deviveiros, P O; Dewhurst, A; Dhaliwal, S; Di Bello, F A; Di Ciaccio, A; Di Ciaccio, L; Di Clemente, W K; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Micco, B; Di Nardo, R; Di Petrillo, K F; Di Simone, A; Di Sipio, R; Di Valentino, D; Diaconu, C; Diamond, M; Dias, F A; Diaz, M A; Diehl, E B; Dietrich, J; Díez Cornell, S; Dimitrievska, A; Dingfelder, J; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; Djuvsland, J I; do Vale, M A B; Dobos, D; Dobre, M; Doglioni, C; Dolejsi, J; Dolezal, Z; Donadelli, M; Donati, S; Dondero, P; Donini, J; Dopke, J; Doria, A; Dova, M T; Doyle, A T; Drechsler, E; Dris, M; Du, Y; Duarte-Campderros, J; Dubreuil, A; Duchovni, E; Duckeck, G; Ducourthial, A; Ducu, O A; Duda, D; Dudarev, A; Dudder, A Chr; Duffield, E M; Duflot, L; Dührssen, M; Dumancic, M; Dumitriu, A E; Duncan, A K; Dunford, M; Duran Yildiz, H; Düren, M; Durglishvili, A; Duschinger, D; Dutta, B; Dyndal, M; Eckardt, C; Ecker, K M; Edgar, R C; Eifert, T; Eigen, G; Einsweiler, K; Ekelof, T; El Kacimi, M; El Kosseifi, R; Ellajosyula, V; Ellert, M; Elles, S; Ellinghaus, F; Elliot, A A; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Endner, O C; Ennis, J S; Erdmann, J; Ereditato, A; Ernis, G; Ernst, M; Errede, S; Escalier, M; Escobar, C; Esposito, B; Estrada Pastor, O; Etienvre, A I; Etzion, E; Evans, H; Ezhilov, A; Ezzi, M; Fabbri, F; Fabbri, L; Facini, G; Fakhrutdinov, R M; Falciano, S; Falla, R J; Faltova, J; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farina, C; Farina, E M; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Faucci Giannelli, M; Favareto, A; Fawcett, W J; Fayard, L; Fedin, O L; Fedorko, W; Feigl, S; Feligioni, L; Feng, C; Feng, E J; Feng, H; Fenton, M J; Fenyuk, A B; Feremenga, L; Fernandez Martinez, P; Fernandez Perez, S; Ferrando, J; Ferrari, A; Ferrari, P; Ferrari, R; Ferreira de Lima, D E; Ferrer, A; Ferrere, D; Ferretti, C; Fiedler, F; Filipčič, A; Filipuzzi, M; Filthaut, F; Fincke-Keeler, M; Finelli, K D; Fiolhais, M C N; Fiorini, L; Fischer, A; Fischer, C; Fischer, J; Fisher, W C; Flaschel, N; Fleck, I; Fleischmann, P; Fletcher, R R M; Flick, T; Flierl, B M; Flores Castillo, L R; Flowerdew, M J; Forcolin, G T; Formica, A; Förster, F A; Forti, A; Foster, A G; Fournier, D; Fox, H; Fracchia, S; Francavilla, P; Franchini, M; Franchino, S; Francis, D; Franconi, L; Franklin, M; Frate, M; Fraternali, M; Freeborn, D; Fressard-Batraneanu, S M; Freund, B; Froidevaux, D; Frost, J A; Fukunaga, C; Fusayasu, T; Fuster, J; Gabaldon, C; Gabizon, O; Gabrielli, A; Gabrielli, A; Gach, G P; Gadatsch, S; Gadomski, S; Gagliardi, G; Gagnon, L G; Galea, C; Galhardo, B; Gallas, E J; Gallop, B J; Gallus, P; Galster, G; Gan, K K; Ganguly, S; Gao, Y; Gao, Y S; Garay Walls, F M; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gascon Bravo, A; Gasnikova, K; Gatti, C; Gaudiello, A; Gaudio, G; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Gee, C N P; Geisen, J; Geisen, M; Geisler, M P; Gellerstedt, K; Gemme, C; Genest, M H; Geng, C; Gentile, S; Gentsos, C; George, S; Gerbaudo, D; Gershon, A; Geßner, G; Ghasemi, S; Ghneimat, M; Giacobbe, B; Giagu, S; Giannetti, P; Gibson, S M; Gignac, M; Gilchriese, M; Gillberg, D; Gilles, G; Gingrich, D M; Giokaris, N; Giordani, M P; Giorgi, F M; Giraud, P F; Giromini, P; Giugni, D; Giuli, F; Giuliani, C; Giulini, M; Gjelsten, B K; Gkaitatzis, S; Gkialas, I; Gkougkousis, E L; Gkountoumis, P; Gladilin, L K; Glasman, C; Glatzer, J; Glaysher, P C F; Glazov, A; Goblirsch-Kolb, M; Godlewski, J; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gonçalo, R; Goncalves Gama, R; Goncalves Pinto Firmino Da Costa, J; Gonella, G; Gonella, L; Gongadze, A; González de la Hoz, S; Gonzalez-Sevilla, S; Goossens, L; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorini, B; Gorini, E; Gorišek, A; Goshaw, A T; Gössling, C; Gostkin, M I; Gottardo, C A; Goudet, C R; Goujdami, D; Goussiou, A G; Govender, N; Gozani, E; Graber, L; Grabowska-Bold, I; Gradin, P O J; Gramling, J; Gramstad, E; Grancagnolo, S; Gratchev, V; Gravila, P M; Gray, C; Gray, H M; Greenwood, Z D; Grefe, C; Gregersen, K; Gregor, I M; Grenier, P; Grevtsov, K; Griffiths, J; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grivaz, J-F; Groh, S; Gross, E; Grosse-Knetter, J; Grossi, G C; Grout, Z J; Grummer, A; Guan, L; Guan, W; Guenther, J; Guescini, F; Guest, D; Gueta, O; Gui, B; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gumpert, C; Guo, J; Guo, W; Guo, Y; Gupta, R; Gupta, S; Gustavino, G; Gutierrez, P; Gutierrez Ortiz, N G; Gutschow, C; Guyot, C; Guzik, M P; Gwenlan, C; Gwilliam, C B; Haas, A; Haber, C; Hadavand, H K; Haddad, N; Hadef, A; Hageböck, S; Hagihara, M; Hakobyan, H; Haleem, M; Haley, J; Halladjian, G; Hallewell, G D; Hamacher, K; Hamal, P; Hamano, K; Hamilton, A; Hamity, G N; Hamnett, P G; Han, L; Han, S; Hanagaki, K; Hanawa, K; Hance, M; Haney, B; Hanke, P; Hansen, J B; Hansen, J D; Hansen, M C; Hansen, P H; Hara, K; Hard, A S; Harenberg, T; Hariri, F; Harkusha, S; Harrington, R D; Harrison, P F; Hartmann, N M; Hasegawa, M; Hasegawa, Y; Hasib, A; Hassani, S; Haug, S; Hauser, R; Hauswald, L; Havener, L B; Havranek, M; Hawkes, C M; Hawkings, R J; Hayakawa, D; Hayden, D; Hays, C P; Hays, J M; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heidegger, K K; Heim, S; Heim, T; Heinemann, B; Heinrich, J J; Heinrich, L; Heinz, C; Hejbal, J; Helary, L; Held, A; Hellman, S; Helsens, C; Henderson, R C W; Heng, Y; Henkelmann, S; Henriques Correia, A M; Henrot-Versille, S; Herbert, G H; Herde, H; Herget, V; Hernández Jiménez, Y; Herten, G; Hertenberger, R; Hervas, L; Herwig, T C; Hesketh, G G; Hessey, N P; Hetherly, J W; Higashino, S; Higón-Rodriguez, E; Hill, E; Hill, J C; Hiller, K H; Hillier, S J; Hils, M; Hinchliffe, I; Hirose, M; Hirschbuehl, D; Hiti, B; Hladik, O; Hoad, X; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoenig, F; Hohn, D; Holmes, T R; Homann, M; Honda, S; Honda, T; Hong, T M; Hooberman, B H; Hopkins, W H; Horii, Y; Horton, A J; Hostachy, J-Y; Hou, S; Hoummada, A; Howarth, J; Hoya, J; Hrabovsky, M; Hrdinka, J; Hristova, I; Hrivnac, J; Hryn'ova, T; Hrynevich, A; Hsu, P J; Hsu, S-C; Hu, Q; Hu, S; Huang, Y; Hubacek, Z; Hubaut, F; Huegging, F; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Huo, P; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibragimov, I; Iconomidou-Fayard, L; Idrissi, Z; Iengo, P; Igonkina, O; Iizawa, T; Ikegami, Y; Ikeno, M; Ilchenko, Y; Iliadis, D; Ilic, N; Introzzi, G; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Isacson, M F; Ishijima, N; Ishino, M; Ishitsuka, M; Issever, C; Istin, S; Ito, F; Iturbe Ponce, J M; Iuppa, R; Iwasaki, H; Izen, J M; Izzo, V; Jabbar, S; Jackson, P; Jacobs, R M; Jain, V; Jakobi, K B; Jakobs, K; Jakobsen, S; Jakoubek, T; Jamin, D O; Jana, D K; Jansky, R; Janssen, J; Janus, M; Janus, P A; Jarlskog, G; Javadov, N; Javůrek, T; Javurkova, M; Jeanneau, F; Jeanty, L; Jejelava, J; Jelinskas, A; Jenni, P; Jeske, C; Jézéquel, S; Ji, H; Jia, J; Jiang, H; Jiang, Y; Jiang, Z; Jiggins, S; Jimenez Pena, J; Jin, S; Jinaru, A; Jinnouchi, O; Jivan, H; Johansson, P; Johns, K A; Johnson, C A; Johnson, W J; Jon-And, K; Jones, R W L; Jones, S D; Jones, S; Jones, T J; Jongmanns, J; Jorge, P M; Jovicevic, J; Ju, X; Juste Rozas, A; Köhler, M K; Kaczmarska, A; Kado, M; Kagan, H; Kagan, M; Kahn, S J; Kaji, T; Kajomovitz, E; Kalderon, C W; Kaluza, A; Kama, S; Kamenshchikov, A; Kanaya, N; Kanjir, L; Kantserov, V A; Kanzaki, J; Kaplan, B; Kaplan, L S; Kar, D; Karakostas, K; Karastathis, N; Kareem, M J; Karentzos, E; Karpov, S N; Karpova, Z M; Karthik, K; Kartvelishvili, V; Karyukhin, A N; Kasahara, K; Kashif, L; Kass, R D; Kastanas, A; Kataoka, Y; Kato, C; Katre, A; Katzy, J; Kawade, K; Kawagoe, K; Kawamoto, T; Kawamura, G; Kay, E F; Kazanin, V F; Keeler, R; Kehoe, R; Keller, J S; Kempster, J J; Kendrick, J; Keoshkerian, H; Kepka, O; Kerševan, B P; Kersten, S; Keyes, R A; Khader, M; Khalil-Zada, F; Khanov, A; Kharlamov, A G; Kharlamova, T; Khodinov, A; Khoo, T J; Khovanskiy, V; Khramov, E; Khubua, J; Kido, S; Kilby, C R; Kim, H Y; Kim, S H; Kim, Y K; Kimura, N; Kind, O M; King, B T; Kirchmeier, D; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kiuchi, K; Kivernyk, O; Kladiva, E; Klapdor-Kleingrothaus, T; Klein, M H; Klein, M; Klein, U; Kleinknecht, K; Klimek, P; Klimentov, A; Klingenberg, R; Klingl, T; Klioutchnikova, T; Kluge, E-E; Kluit, P; Kluth, S; Kneringer, E; Knoops, E B F G; Knue, A; Kobayashi, A; Kobayashi, D; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koffas, T; Koffeman, E; Köhler, N M; Koi, T; Kolb, M; Koletsou, I; Komar, A A; Komori, Y; Kondo, T; Kondrashova, N; Köneke, K; König, A C; Kono, T; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A A; Korolkov, I; Korolkova, E V; Kortner, O; Kortner, S; Kosek, T; Kostyukhin, V V; Kotwal, A; Koulouris, A; Kourkoumeli-Charalampidi, A; Kourkoumelis, C; Kourlitis, E; Kouskoura, V; Kowalewska, A B; Kowalewski, R; Kowalski, T Z; Kozakai, C; Kozanecki, W; Kozhin, A S; Kramarenko, V A; Kramberger, G; Krasnopevtsev, D; Krasny, M W; Krasznahorkay, A; Krauss, D; Kremer, J A; Kretzschmar, J; Kreutzfeldt, K; Krieger, P; Krizka, K; Kroeninger, K; Kroha, H; Kroll, J; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Krumnack, N; Kruse, M C; Kubota, T; Kucuk, H; Kuday, S; Kuechler, J T; Kuehn, S; Kugel, A; Kuger, F; Kuhl, T; Kukhtin, V; Kukla, R; Kulchitsky, Y; Kuleshov, S; Kulinich, Y P; Kuna, M; Kunigo, T; Kupco, A; Kupfer, T; Kuprash, O; Kurashige, H; Kurchaninov, L L; Kurochkin, Y A; Kurth, M G; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwan, T; Kyriazopoulos, D; La Rosa, A; Navarro, J L La Rosa; La Rotonda, L; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Lammers, S; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lanfermann, M C; Lang, V S; Lange, J C; Langenberg, R J; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Lapertosa, A; Laplace, S; Laporte, J F; Lari, T; Lasagni Manghi, F; Lassnig, M; Laurelli, P; Lavrijsen, W; Law, A T; Laycock, P; Lazovich, T; Lazzaroni, M; Le, B; Le Dortz, O; Le Guirriec, E; Le Quilleuc, E P; LeBlanc, M; LeCompte, T; Ledroit-Guillon, F; Lee, C A; Lee, G R; Lee, S C; Lee, L; Lefebvre, B; Lefebvre, G; Lefebvre, M; Legger, F; Leggett, C; Lehan, A; Lehmann Miotto, G; Lei, X; Leight, W A; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Leney, K J C; Lenz, T; Lenzi, B; Leone, R; Leone, S; Leonidopoulos, C; Lerner, G; Leroy, C; Lesage, A A J; Lester, C G; Levchenko, M; Levêque, J; Levin, D; Levinson, L J; Levy, M; Lewis, D; Li, B; Li, C; Li, H; Li, L; Li, Q; Li, S; Li, X; Li, Y; Liang, Z; Liberti, B; Liblong, A; Lie, K; Liebal, J; Liebig, W; Limosani, A; Lin, S C; Lin, T H; Lindquist, B E; Lionti, A E; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lister, A; Litke, A M; Liu, B; Liu, H; Liu, H; Liu, J K K; Liu, J; Liu, J B; Liu, K; Liu, L; Liu, M; Liu, Y L; Liu, Y; Livan, M; Lleres, A; Llorente Merino, J; Lloyd, S L; Lo, C Y; Sterzo, F Lo; Lobodzinska, E M; Loch, P; Loebinger, F K; Loesle, A; Loew, K M; Loginov, A; Lohse, T; Lohwasser, K; Lokajicek, M; Long, B A; Long, J D; Long, R E; Longo, L; Looper, K A; Lopez, J A; Lopez Mateos, D; Lopez Paz, I; Lopez Solis, A; Lorenz, J; Lorenzo Martinez, N; Losada, M; Lösel, P J; Lou, X; Lounis, A; Love, J; Love, P A; Lu, H; Lu, N; Lu, Y J; Lubatti, H J; Luci, C; Lucotte, A; Luedtke, C; Luehring, F; Lukas, W; Luminari, L; Lundberg, O; Lund-Jensen, B; Luzi, P M; Lynn, D; Lysak, R; Lytken, E; Lyubushkin, V; Ma, H; Ma, L L; Ma, Y; Maccarrone, G; Macchiolo, A; Macdonald, C M; Maček, B; Machado Miguens, J; Madaffari, D; Madar, R; Mader, W F; Madsen, A; Maeda, J; Maeland, S; Maeno, T; Maevskiy, A S; Magradze, E; Mahlstedt, J; Maiani, C; Maidantchik, C; Maier, A A; Maier, T; Maio, A; Majersky, O; Majewski, S; Makida, Y; Makovec, N; Malaescu, B; Malecki, Pa; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyukov, S; Mamuzic, J; Mancini, G; Mandelli, L; Mandić, I; Maneira, J; Manhaes de Andrade Filho, L; Manjarres Ramos, J; Mann, A; Manousos, A; Mansoulie, B; Mansour, J D; Mantifel, R; Mantoani, M; Manzoni, S; Mapelli, L; Marceca, G; March, L; Marchese, L; Marchiori, G; Marcisovsky, M; Marjanovic, M; Marley, D E; Marroquim, F; Marsden, S P; Marshall, Z; Martensson, M U F; Marti-Garcia, S; Martin, C B; Martin, T A; Martin, V J; Martin Dit Latour, B; Martinez, M; Martinez Outschoorn, V I; Martin-Haugh, S; Martoiu, V S; Martyniuk, A C; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, L; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Mättig, P; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Maznas, I; Mazza, S M; Mc Fadden, N C; Mc Goldrick, G; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McClymont, L I; McDonald, E F; Mcfayden, J A; Mchedlidze, G; McMahon, S J; McNamara, P C; McPherson, R A; Meehan, S; Megy, T J; Mehlhase, S; Mehta, A; Meideck, T; Meier, K; Meirose, B; Melini, D; Mellado Garcia, B R; Mellenthin, J D; Melo, M; Meloni, F; Menary, S B; Meng, L; Meng, X T; Mengarelli, A; Menke, S; Meoni, E; Mergelmeyer, S; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, J-P; Meyer, J; Meyer Zu Theenhausen, H; Miano, F; Middleton, R P; Miglioranzi, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Milesi, M; Milic, A; Miller, D W; Mills, C; Milov, A; Milstead, D A; Minaenko, A A; Minami, Y; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Minegishi, Y; Ming, Y; Mir, L M; Mistry, K P; Mitani, T; Mitrevski, J; Mitsou, V A; Miucci, A; Miyagawa, P S; Mizukami, A; Mjörnmark, J U; Mkrtchyan, T; Mlynarikova, M; Moa, T; Mochizuki, K; Mogg, P; Mohapatra, S; Molander, S; Moles-Valls, R; Monden, R; Mondragon, M C; Mönig, K; Monk, J; Monnier, E; Montalbano, A; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Morange, N; Moreno, D; Moreno Llácer, M; Morettini, P; Morgenstern, S; Mori, D; Mori, T; Morii, M; Morinaga, M; Morisbak, V; Morley, A K; Mornacchi, G; Morris, J D; Morvaj, L; Moschovakos, P; Mosidze, M; Moss, H J; Moss, J; Motohashi, K; Mount, R; Mountricha, E; Moyse, E J W; Muanza, S; Mudd, R D; Mueller, F; Mueller, J; Mueller, R S P; Muenstermann, D; Mullen, P; Mullier, G A; Munoz Sanchez, F J; Murray, W J; Musheghyan, H; Muškinja, M; Myagkov, A G; Myska, M; Nachman, B P; Nackenhorst, O; Nagai, K; Nagai, R; Nagano, K; Nagasaka, Y; Nagata, K; Nagel, M; Nagy, E; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Naranjo Garcia, R F; Narayan, R; Narrias Villar, D I; Naryshkin, I; Naumann, T; Navarro, G; Nayyar, R; Neal, H A; Nechaeva, P Yu; Neep, T J; Negri, A; Negrini, M; Nektarijevic, S; Nellist, C; Nelson, A; Nelson, M E; Nemecek, S; Nemethy, P; Nessi, M; Neubauer, M S; Neumann, M; Newman, P R; Ng, T Y; Nguyen Manh, T; Nickerson, R B; Nicolaidou, R; Nielsen, J; Nikolaenko, V; Nikolic-Audit, I; Nikolopoulos, K; Nilsen, J K; Nilsson, P; Ninomiya, Y; Nisati, A; Nishu, N; Nisius, R; Nitsche, I; Nobe, T; Noguchi, Y; Nomachi, M; Nomidis, I; Nomura, M A; Nooney, T; Nordberg, M; Norjoharuddeen, N; Novgorodova, O; Nowak, S; Nozaki, M; Nozka, L; Ntekas, K; Nurse, E; Nuti, F; O'connor, K; O'Neil, D C; O'Rourke, A A; O'Shea, V; Oakham, F G; Oberlack, H; Obermann, T; Ocariz, J; Ochi, A; Ochoa, I; Ochoa-Ricoux, J P; Oda, S; Odaka, S; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohman, H; Oide, H; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Oleiro Seabra, L F; Olivares Pino, S A; Oliveira Damazio, D; Olszewski, A; Olszowska, J; Onofre, A; Onogi, K; Onyisi, P U E; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Orr, R S; Osculati, B; Ospanov, R; Otero Y Garzon, G; Otono, H; Ouchrif, M; Ould-Saada, F; Ouraou, A; Oussoren, K P; Ouyang, Q; Owen, M; Owen, R E; Ozcan, V E; Ozturk, N; Pachal, K; Pacheco Pages, A; Pacheco Rodriguez, L; Padilla Aranda, C; Pagan Griso, S; Paganini, M; Paige, F; Palacino, G; Palazzo, S; Palestini, S; Palka, M; Pallin, D; St Panagiotopoulou, E; Panagoulias, I; Pandini, C E; Panduro Vazquez, J G; Pani, P; Panitkin, S; Pantea, D; Paolozzi, L; Papadopoulou, Th D; Papageorgiou, K; Paramonov, A; Paredes Hernandez, D; Parker, A J; Parker, M A; Parker, K A; Parodi, F; Parsons, J A; Parzefall, U; Pascuzzi, V R; Pasner, J M; Pasqualucci, E; Passaggio, S; Pastore, Fr; Pataraia, S; Pater, J R; Pauly, T; Pearson, B; Pedraza Lopez, S; Pedro, R; Peleganchuk, S V; Penc, O; Peng, C; Peng, H; Penwell, J; Peralva, B S; Perego, M M; Perepelitsa, D V; Perini, L; Pernegger, H; Perrella, S; Peschke, R; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petroff, P; Petrolo, E; Petrov, M; Petrucci, F; Pettersson, N E; Peyaud, A; Pezoa, R; Phillips, F H; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Pickering, M A; Piegaia, R; Pilcher, J E; Pilkington, A D; Pin, A W J; Pinamonti, M; Pinfold, J L; Pirumov, H; Pitt, M; Plazak, L; Pleier, M-A; Pleskot, V; Plotnikova, E; Pluth, D; Podberezko, P; Poettgen, R; Poggi, R; Poggioli, L; Pohl, D; Polesello, G; Poley, A; Policicchio, A; Polifka, R; Polini, A; Pollard, C S; Polychronakos, V; Pommès, K; Ponomarenko, D; Pontecorvo, L; Pope, B G; Popeneciu, G A; Poppleton, A; Pospisil, S; Potamianos, K; Potrap, I N; Potter, C J; Poulard, G; Poulsen, T; Poveda, J; Pozo Astigarraga, M E; Pralavorio, P; Pranko, A; Prell, S; Price, D; Price, L E; Primavera, M; Prince, S; Proklova, N; Prokofiev, K; Prokoshin, F; Protopopescu, S; Proudfoot, J; Przybycien, M; Puri, A; Puzo, P; Qian, J; Qin, G; Qin, Y; Quadt, A; Queitsch-Maitland, M; Quilty, D; Raddum, S; Radeka, V; Radescu, V; Radhakrishnan, S K; Radloff, P; Rados, P; Ragusa, F; Rahal, G; Raine, J A; Rajagopalan, S; Rangel-Smith, C; Rashid, T; Raspopov, S; Ratti, M G; Rauch, D M; Rauscher, F; Rave, S; Ravinovich, I; Rawling, J H; Raymond, M; Read, A L; Readioff, N P; Reale, M; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reed, R G; Reeves, K; Rehnisch, L; Reichert, J; Reiss, A; Rembser, C; Ren, H; Rescigno, M; Resconi, S; Resseguie, E D; Rettie, S; Reynolds, E; Rezanova, O L; Reznicek, P; Rezvani, R; Richter, R; Richter, S; Richter-Was, E; Ricken, O; Ridel, M; Rieck, P; Riegel, C J; Rieger, J; Rifki, O; Rijssenbeek, M; Rimoldi, A; Rimoldi, M; Rinaldi, L; Ripellino, G; Ristić, B; Ritsch, E; Riu, I; Rizatdinova, F; Rizvi, E; Rizzi, C; Roberts, R T; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Rocco, E; Roda, C; Rodina, Y; Rodriguez Bosca, S; Rodriguez Perez, A; Rodriguez Rodriguez, D; Roe, S; Rogan, C S; Røhne, O; Roloff, J; Romaniouk, A; Romano, M; Romano Saez, S M; Romero Adam, E; Rompotis, N; Ronzani, M; Roos, L; Rosati, S; Rosbach, K; Rose, P; Rosien, N-A; Rossi, E; Rossi, L P; Rosten, J H N; Rosten, R; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rühr, F; Ruiz-Martinez, A; Rurikova, Z; Rusakovich, N A; Russell, H L; Rutherfoord, J P; Ruthmann, N; Ryabov, Y F; Rybar, M; Rybkin, G; Ryu, S; Ryzhov, A; Rzehorz, G F; Saavedra, A F; Sabato, G; Sacerdoti, S; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Saha, P; Sahinsoy, M; Saimpert, M; Saito, M; Saito, T; Sakamoto, H; Sakurai, Y; Salamanna, G; Salazar Loyola, J E; Salek, D; Sales De Bruin, P H; Salihagic, D; Salnikov, A; Salt, J; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sammel, D; Sampsonidis, D; Sampsonidou, D; Sánchez, J; Sanchez Martinez, V; Sanchez Pineda, A; Sandaker, H; Sandbach, R L; Sander, C O; Sandhoff, M; Sandoval, C; Sankey, D P C; Sannino, M; Sansoni, A; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapronov, A; Saraiva, J G; Sarrazin, B; Sasaki, O; Sato, K; Sauvan, E; Savage, G; Savard, P; Savic, N; Sawyer, C; Sawyer, L; Saxon, J; Sbarra, C; Sbrizzi, A; Scanlon, T; Scannicchio, D A; Scarcella, M; Scarfone, V; Schaarschmidt, J; Schacht, P; Schachtner, B M; Schaefer, D; Schaefer, L; Schaefer, R; Schaeffer, J; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Schiavi, C; Schier, S; Schildgen, L K; Schillo, C; Schioppa, M; Schlenker, S; Schmidt-Sommerfeld, K R; Schmieden, K; Schmitt, C; Schmitt, S; Schmitz, S; Schnoor, U; Schoeffel, L; Schoening, A; Schoenrock, B D; Schopf, E; Schott, M; Schouwenberg, J F P; Schovancova, J; Schramm, S; Schuh, N; Schulte, A; Schultens, M J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwartzman, A; Schwarz, T A; Schweiger, H; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Sciandra, A; Sciolla, G; Scuri, F; Scutti, F; Searcy, J; Seema, P; Seidel, S C; Seiden, A; Seixas, J M; Sekhniaidze, G; Sekhon, K; Sekula, S J; Semprini-Cesari, N; Senkin, S; Serfon, C; Serin, L; Serkin, L; Sessa, M; Seuster, R; Severini, H; Sfiligoj, T; Sforza, F; Sfyrla, A; Shabalina, E; Shaikh, N W; Shan, L Y; Shang, R; Shank, J T; Shapiro, M; Shatalov, P B; Shaw, K; Shaw, S M; Shcherbakova, A; Shehu, C Y; Shen, Y; Sherafati, N; Sherwood, P; Shi, L; Shimizu, S; Shimmin, C O; Shimojima, M; Shipsey, I P J; Shirabe, S; Shiyakova, M; Shlomi, J; Shmeleva, A; Shoaleh Saadi, D; Shochet, M J; Shojaii, S; Shope, D R; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sickles, A M; Sidebo, P E; Sideras Haddad, E; Sidiropoulou, O; Sidoti, A; Siegert, F; Sijacki, Dj; Silva, J; Silverstein, S B; Simak, V; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simon, M; Sinervo, P; Sinev, N B; Sioli, M; Siragusa, G; Siral, I; Sivoklokov, S Yu; Sjölin, J; Skinner, M B; Skubic, P; Slater, M; Slavicek, T; Slawinska, M; Sliwa, K; Slovak, R; Smakhtin, V; Smart, B H; Smiesko, J; Smirnov, N; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, J W; Smith, M N K; Smith, R W; Smizanska, M; Smolek, K; Snesarev, A A; Snyder, I M; Snyder, S; Sobie, R; Socher, F; Soffer, A; Soh, D A; Sokhrannyi, G; Solans Sanchez, C A; Solar, M; Soldatov, E Yu; Soldevila, U; Solodkov, A A; Soloshenko, A; Solovyanov, O V; Solovyev, V; Sommer, P; Son, H; Sopczak, A; Sosa, D; Sotiropoulou, C L; Soualah, R; Soukharev, A M; South, D; Sowden, B C; Spagnolo, S; Spalla, M; Spangenberg, M; Spanò, F; Sperlich, D; Spettel, F; Spieker, T M; Spighi, R; Spigo, G; Spiller, L A; Spousta, M; St Denis, R D; Stabile, A; Stamen, R; Stamm, S; Stanecka, E; Stanek, R W; Stanescu, C; Stanitzki, M M; Stapf, B S; Stapnes, S; Starchenko, E A; Stark, G H; Stark, J; Stark, S H; Staroba, P; Starovoitov, P; Stärz, S; Staszewski, R; Steinberg, P; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stewart, G A; Stockton, M C; Stoebe, M; Stoicea, G; Stolte, P; Stonjek, S; Stradling, A R; Straessner, A; Stramaglia, M E; Strandberg, J; Strandberg, S; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Stroynowski, R; Strubig, A; Stucci, S A; Stugu, B; Styles, N A; Su, D; Su, J; Suchek, S; Sugaya, Y; Suk, M; Sulin, V V; Sultan, D M S; Sultansoy, S; Sumida, T; Sun, S; Sun, X; Suruliz, K; Suster, C J E; Sutton, M R; Suzuki, S; Svatos, M; Swiatlowski, M; Swift, S P; Sykora, I; Sykora, T; Ta, D; Tackmann, K; Taenzer, J; Taffard, A; Tafirout, R; Taiblum, N; Takai, H; Takashima, R; Takasugi, E H; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A A; Tanaka, J; Tanaka, M; Tanaka, R; Tanaka, S; Tanioka, R; Tannenwald, B B; Tapia Araya, S; Tapprogge, S; Tarem, S; Tartarelli, G F; Tas, P; Tasevsky, M; Tashiro, T; Tassi, E; Tavares Delgado, A; Tayalati, Y; Taylor, A C; Taylor, G N; Taylor, P T E; Taylor, W; Teixeira-Dias, P; Temple, D; Ten Kate, H; Teng, P K; Teoh, J J; Tepel, F; Terada, S; Terashi, K; Terron, J; Terzo, S; Testa, M; Teuscher, R J; Theveneaux-Pelzer, T; Thomas, J P; Thomas-Wilsker, J; Thompson, P D; Thompson, A S; Thomsen, L A; Thomson, E; Tibbetts, M J; Ticse Torres, R E; Tikhomirov, V O; Tikhonov, Yu A; Timoshenko, S; Tipton, P; Tisserant, S; Todome, K; Todorova-Nova, S; Tojo, J; Tokár, S; Tokushuku, K; Tolley, E; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Tong, B; Tornambe, P; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Treado, C J; Trefzger, T; Tresoldi, F; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Trischuk, W; Trocmé, B; Trofymov, A; Troncon, C; Trottier-McDonald, M; Trovatelli, M; Truong, L; Trzebinski, M; Trzupek, A; Tsang, K W; Tseng, J C-L; Tsiareshka, P V; Tsipolitis, G; Tsirintanis, N; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsui, K M; Tsukerman, I I; Tsulaia, V; Tsuno, S; Tsybychev, D; Tu, Y; Tudorache, A; Tudorache, V; Tulbure, T T; Tuna, A N; Tupputi, S A; Turchikhin, S; Turgeman, D; Turk Cakir, I; Turra, R; Tuts, P M; Ucchielli, G; Ueda, I; Ughetto, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Unverdorben, C; Urban, J; Urquijo, P; Urrejola, P; Usai, G; Usui, J; Vacavant, L; Vacek, V; Vachon, B; Valderanis, C; Valdes Santurio, E; Valentinetti, S; Valero, A; Valéry, L; Valkar, S; Vallier, A; Valls Ferrer, J A; Van Den Wollenberg, W; van der Graaf, H; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; van Woerden, M C; Vanadia, M; Vandelli, W; Vaniachine, A; Vankov, P; Vardanyan, G; Vari, R; Varnes, E W; Varni, C; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vasquez, J G; Vasquez, G A; Vazeille, F; Vazquez Schroeder, T; Veatch, J; Veeraraghavan, V; Veloce, L M; Veloso, F; Veneziano, S; Ventura, A; Venturi, M; Venturi, N; Venturini, A; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, A T; Vermeulen, J C; Vetterli, M C; Viaux Maira, N; Viazlo, O; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Vigani, L; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinogradov, V B; Vishwakarma, A; Vittori, C; Vivarelli, I; Vlachos, S; Vlasak, M; Vogel, M; Vokac, P; Volpi, G; von der Schmitt, H; von Toerne, E; Vorobel, V; Vorobev, K; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Vranjes Milosavljevic, M; Vrba, V; Vreeswijk, M; Vuillermet, R; Vukotic, I; Wagner, P; Wagner, W; Wagner-Kuhr, J; Wahlberg, H; Wahrmund, S; Wakabayashi, J; Walder, J; Walker, R; Walkowiak, W; Wallangen, V; Wang, C; Wang, C; Wang, F; Wang, H; Wang, H; Wang, J; Wang, J; Wang, Q; Wang, R; Wang, S M; Wang, T; Wang, W; Wang, W; Wang, Z; Wanotayaroj, C; Warburton, A; Ward, C P; Wardrope, D R; Washbrook, A; Watkins, P M; Watson, A T; Watson, M F; Watts, G; Watts, S; Waugh, B M; Webb, A F; Webb, S; Weber, M S; Weber, S W; Weber, S A; Webster, J S; Weidberg, A R; Weinert, B; Weingarten, J; Weirich, M; Weiser, C; Weits, H; Wells, P S; Wenaus, T; Wengler, T; Wenig, S; Wermes, N; Werner, M D; Werner, P; Wessels, M; Whalen, K; Whallon, N L; Wharton, A M; White, A S; White, A; White, M J; White, R; Whiteson, D; Whitmore, B W; Wickens, F J; Wiedenmann, W; Wielers, M; Wiglesworth, C; Wiik-Fuchs, L A M; Wildauer, A; Wilk, F; Wilkens, H G; Williams, H H; Williams, S; Willis, C; Willocq, S; Wilson, J A; Wingerter-Seez, I; Winkels, E; Winklmeier, F; Winston, O J; Winter, B T; Wittgen, M; Wobisch, M; Wolf, T M H; Wolff, R; Wolter, M W; Wolters, H; Wong, V W S; Worm, S D; Wosiek, B K; Wotschack, J; Wozniak, K W; Wu, M; Wu, S L; Wu, X; Wu, Y; Wyatt, T R; Wynne, B M; Xella, S; Xi, Z; Xia, L; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yamaguchi, D; Yamaguchi, Y; Yamamoto, A; Yamamoto, S; Yamanaka, T; Yamatani, M; Yamauchi, K; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, Y; Yang, Z; Yao, W-M; Yap, Y C; Yasu, Y; Yatsenko, E; Yau Wong, K H; Ye, J; Ye, S; Yeletskikh, I; Yigitbasi, E; Yildirim, E; Yorita, K; Yoshihara, K; Young, C; Young, C J S; Yu, J; Yu, J; Yuen, S P Y; Yusuff, I; Zabinski, B; Zacharis, G; Zaidan, R; Zaitsev, A M; Zakharchuk, N; Zalieckas, J; Zaman, A; Zambito, S; Zanzi, D; Zeitnitz, C; Zemla, A; Zeng, J C; Zeng, Q; Zenin, O; Ženiš, T; Zerwas, D; Zhang, D; Zhang, F; Zhang, G; Zhang, H; Zhang, J; Zhang, L; Zhang, L; Zhang, M; Zhang, P; Zhang, R; Zhang, R; Zhang, X; Zhang, Y; Zhang, Z; Zhao, X; Zhao, Y; Zhao, Z; Zhemchugov, A; Zhou, B; Zhou, C; Zhou, L; Zhou, M; Zhou, M; Zhou, N; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhukov, K; Zibell, A; Zieminska, D; Zimine, N I; Zimmermann, C; Zimmermann, S; Zinonos, Z; Zinser, M; Ziolkowski, M; Živković, L; Zobernig, G; Zoccoli, A; Zou, R; Zur Nedden, M; Zwalinski, L

    2017-01-01

    Results of a search for physics beyond the Standard Model in events containing an energetic photon and large missing transverse momentum with the ATLAS detector at the Large Hadron Collider are reported. As the number of events observed in data, corresponding to an integrated luminosity of 36.1 fb[Formula: see text]  of proton-proton collisions at a centre-of-mass energy of [Formula: see text], is in agreement with the Standard Model expectations, model-independent limits are set on the fiducial cross section for the production of events in this final state. Exclusion limits are also placed in models where dark-matter candidates are pair-produced. For dark-matter production via an axial-vector or a vector mediator in the s -channel, this search excludes mediator masses below 750-[Formula: see text] for dark-matter candidate masses below 230-[Formula: see text] at 95% confidence level, depending on the couplings. In an effective theory of dark-matter production, the limits restrict the value of the suppression scale [Formula: see text] to be above [Formula: see text] at 95% confidence level. A limit is also reported on the production of a high-mass scalar resonance by processes beyond the Standard Model, in which the resonance decays to [Formula: see text] and the Z boson subsequently decays into neutrinos.

  2. Automation of Survey Data Processing, Documentation and Dissemination: An Application to Large-Scale Self-Reported Educational Survey.

    Science.gov (United States)

    Shim, Eunjae; Shim, Minsuk K.; Felner, Robert D.

    Automation of the survey process has proved successful in many industries, yet it is still underused in educational research. This is largely due to the facts (1) that number crunching is usually carried out using software that was developed before information technology existed, and (2) that the educational research is to a great extent trapped…

  3. Proxima: a presentation-oriented editor for structured documents

    OpenAIRE

    Schrage, M.M.

    2004-01-01

    A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences between editors, the core editing behavior, whether performed in a word-processor or a spreadsheet, is largely similar: document fragments may be copied and pasted, and new parts of the document may be ...

  4. The journey from texting to applications on personally owned devices to enhance student eEngagement in large lectures: A pilot study

    Directory of Open Access Journals (Sweden)

    Trevor Nesbit

    Full Text Available Increasing class sizes to gain economies of scale have resulted in less interaction between lecturers and students during lectures. This paper presented the results of a pilot study that set out to examine the use of applications on personally owned devices (APODs to enhance student interaction, participation and engagement in large lectures. The pilot study commences with the development and trial of a text messaging based application, and after a survey of students regarding ownership levels of mobile devices, concludes with the trial of an application developed for mobile devices. The conclusions of the paper highlight that the use of APODs can significantly increase student interaction, participation and engagement in large lectures and identifies implications and opportunities for further research.

  5. Large-scale dam removal in the northeast United States: documenting ecological responses to the Penobscot River Restoration Project

    Science.gov (United States)

    Collins, M. J.; Aponte Clarke, G.; Baeder, C.; McCaw, D.; Royte, J.; Saunders, R.; Sheehan, T.

    2012-12-01

    The Penobscot River Restoration Project aims to improve aquatic connectivity in New England's second largest watershed ( 22,000 km2) by removing the two lowermost, mainstem dams and bypassing a third dam on a principal tributary upstream. Project objectives include: restoring unobstructed access to the entire historic riverine range for five lower river diadromous species including Atlantic and shortnose sturgeon; significantly improving access to upstream habitat for six upper river diadromous species including Atlantic salmon; reconnecting trophic linkages between headwater areas and the Gulf of Maine; restoring fluvial processes to the former impoundments; improving recreational and Penobscot Nation cultural opportunities; and maintaining basin-wide hydropower output. The project is expected to have landscape-scale benefits and the need for a significant investment in long-term monitoring and evaluation to formally quantify ecosystem response has been recognized. A diverse group of federal, state, tribal, NGO, and academic partners has developed a long-term monitoring and evaluation program composed of nine studies that began in 2009. Including American Recovery and Reinvestment Act (ARRA) funding that leveraged partner contributions, we have invested nearly $2M to date in pre- and post-removal investigations that evaluate geomorphology/bed sediment, water quality, wetlands, and fisheries. Given the number of affected diadromous species and the diversity of their life histories, we have initiated six distinct, but related, fisheries investigations to document these expected changes: Atlantic salmon upstream and downstream passage efficiency using passive integrated transponder (PIT) and acoustic telemetry; fish community structure via an index of biotic integrity (IBI); total diadromous fish biomass through hydroacoustics; shortnose sturgeon spawning and habitat use via active and passive acoustic telemetry; and freshwater-marine food web interactions by

  6. Limited Documentation and Treatment Quality of Glycemic Inpatient Care in Relation to Structural Deficits of Heterogeneous Insulin Charts at a Large University Hospital.

    Science.gov (United States)

    Kopanz, Julia; Lichtenegger, Katharina M; Sendlhofer, Gerald; Semlitsch, Barbara; Cuder, Gerald; Pak, Andreas; Pieber, Thomas R; Tax, Christa; Brunner, Gernot; Plank, Johannes

    2018-02-09

    Insulin charts represent a key component in the inpatient glycemic management process. The aim was to evaluate the quality of structure, documentation, and treatment of diabetic inpatient care to design a new standardized insulin chart for a large university hospital setting. Historically grown blank insulin charts in use at 39 general wards were collected and evaluated for quality structure features. Documentation and treatment quality were evaluated in a consecutive snapshot audit of filled-in charts. The primary end point was the percentage of charts with any medication error. Overall, 20 different blank insulin charts with variable designs and significant structural deficits were identified. A medication error occurred in 55% of the 102 audited filled-in insulin charts, consisting of prescription and management errors in 48% and 16%, respectively. Charts of insulin-treated patients had more medication errors relative to patients treated with oral medication (P international standards, a new insulin chart was developed to overcome these quality hurdles.

  7. Working with text tools, techniques and approaches for text mining

    CERN Document Server

    Tourte, Gregory J L

    2016-01-01

    Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...

  8. A text messaging intervention to improve heart failure self-management after hospital discharge in a largely African-American population: before-after study.

    Science.gov (United States)

    Nundy, Shantanu; Razi, Rabia R; Dick, Jonathan J; Smith, Bryan; Mayo, Ainoa; O'Connor, Anne; Meltzer, David O

    2013-03-11

    There is increasing interest in finding novel approaches to reduce health disparities in readmissions for acute decompensated heart failure (ADHF). Text messaging is a promising platform for improving chronic disease self-management in low-income populations, yet is largely unexplored in ADHF. The purpose of this pre-post study was to assess the feasibility and acceptability of a text message-based (SMS: short message service) intervention in a largely African American population with ADHF and explore its effects on self-management. Hospitalized patients with ADHF were enrolled in an automated text message-based heart failure program for 30 days following discharge. Messages provided self-care reminders and patient education on diet, symptom recognition, and health care navigation. Demographic and cell phone usage data were collected on enrollment, and an exit survey was administered on completion. The Self-Care of Heart Failure Index (SCHFI) was administered preintervention and postintervention and compared using sample t tests (composite) and Wilcoxon rank sum tests (individual). Clinical data were collected through chart abstraction. Of 51 patients approached for recruitment, 27 agreed to participate and 15 were enrolled (14 African-American, 1 White). Barriers to enrollment included not owning a personal cell phone (n=12), failing the Mini-Mental exam (n=3), needing a proxy (n=2), hard of hearing (n=1), and refusal (n=3). Another 3 participants left the study for health reasons and 3 others had technology issues. A total of 6 patients (5 African-American, 1 White) completed the postintervention surveys. The mean age was 50 years (range 23-69) and over half had Medicaid or were uninsured (60%, 9/15). The mean ejection fraction for those with systolic dysfunction was 22%, and at least two-thirds had a prior hospitalization in the past year. Participants strongly agreed that the program was easy to use (83%), reduced pills missed (66%), and decreased salt intake

  9. Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

    Directory of Open Access Journals (Sweden)

    Andrew J Reagan

    2017-10-01

    Full Text Available Abstract The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1 the dictionary covers a sufficiently large portion of a given text’s lexicon when weighted by word usage frequency; and (2 words are scored on a continuous scale.

  10. Short message service (SMS) texting as a method of communication during on call: prevalence and experience of medical staff in a large acute NHS Trust in the UK.

    Science.gov (United States)

    Matharu, J; Hale, B; Ammar, M; Brennan, P A

    2016-10-01

    With the widespread use of smartphones, text messaging has become an accepted form of communication for both social and professional use in medicine. To our knowledge no published studies have assessed the prevalence and use of short message service (SMS) texting by doctors on call. We have used an online questionnaire to seek information from doctors in a large NHS Trust in the UK about their use of texting while on call, what they use it for, and whether they send images relevant to patients' care. We received 302 responses (43% response rate), of whom 166 (55%) used SMS while on call. There was a significant association between SMS and age group (p=0.005), with the 20-30-year-old group using it much more than the other age groups. Doctors in the surgical specialties used it significantly less than those in other speciality groups (pcall was deemed to be safe and reliable (pcommunication to use when on call. Copyright © 2016 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.

  11. De que modo os textos oficiais prescrevem o trabalho do professor? Análise comparativa de documentos brasileiros e genebrinos How do official texts prescribe the teacher's work? A comparative analysis of the brazilian and genebrian Documents

    Directory of Open Access Journals (Sweden)

    Anna Rachel Machado

    2005-12-01

    Full Text Available Neste artigo, são apresentados os resultados de análises de dois documentos produzidos por instâncias oficiais para orientar o trabalho dos professores no Brasil e na Suíça. De um lado, buscamos detectar as características da textualização da prescrição do trabalho do professor. Os resultados mostram que, além das propriedades comuns aos textos prescritivos (apagamento do enunciador, contrato de felicidade etc., esses documentos se caracterizam por apresentar uma estrutura temática mais complexa, articulando um agir prescritivo, um agir-fonte e um agir - prescrito. Além disso, buscamos identificar as formas de construção do objeto da prescrição, o que permitiu verificar que, nos dois contextos, esse objeto se configura como uma proposta pedagógica global e não como trabalho concreto dos professores, não estando eles representados, nesses textos, como atores que têm uma real responsabilidade no desenvolvimento das propostas e, paralelamente, apresentando-se os alunos como alvos inertes. Esse trabalho também nos permitiu levantar algumas diferenças das formas de textualização das prescrições examinadas, diferenças essas que relacionamos ao contexto político-econômico dos dois países. Ao final, chegamos a questionamentos referentes às razões da não-consideração do trabalho efetivo dos professores nesse tipo de documento.In this article we present the results of two documents produced by official agencies that aim at guiding the teachers' work in Brazil and in Switzerland. On the one hand, we focused on detecting the textualization features used to prescribe the teacher's work. Results show that besides the common prescriptive features of the two texts (enunciator's erasure, felicity contract etc, these documents carry a more complex thematic structure, articulating a prescriptive doing, a source-doing and a prescribed-doing. We have also tried to identify the forms of building the object of prescription, which

  12. New mathematical cuneiform texts

    CERN Document Server

    Friberg, Jöran

    2016-01-01

    This monograph presents in great detail a large number of both unpublished and previously published Babylonian mathematical texts in the cuneiform script. It is a continuation of the work A Remarkable Collection of Babylonian Mathematical Texts (Springer 2007) written by Jöran Friberg, the leading expert on Babylonian mathematics. Focussing on the big picture, Friberg explores in this book several Late Babylonian arithmetical and metro-mathematical table texts from the sites of Babylon, Uruk and Sippar, collections of mathematical exercises from four Old Babylonian sites, as well as a new text from Early Dynastic/Early Sargonic Umma, which is the oldest known collection of mathematical exercises. A table of reciprocals from the end of the third millennium BC, differing radically from well-documented but younger tables of reciprocals from the Neo-Sumerian and Old-Babylonian periods, as well as a fragment of a Neo-Sumerian clay tablet showing a new type of a labyrinth are also discussed. The material is presen...

  13. Identifying issue frames in text.

    Directory of Open Access Journals (Sweden)

    Eyal Sagi

    Full Text Available Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S.

  14. Text Mining.

    Science.gov (United States)

    Trybula, Walter J.

    1999-01-01

    Reviews the state of research in text mining, focusing on newer developments. The intent is to describe the disparate investigations currently included under the term text mining and provide a cohesive structure for these efforts. A summary of research identifies key organizations responsible for pushing the development of text mining. A section…

  15. Document Models

    Directory of Open Access Journals (Sweden)

    A.A. Malykh

    2017-08-01

    Full Text Available In this paper, the concept of locally simple models is considered. Locally simple models are arbitrarily complex models built from relatively simple components. A lot of practically important domains of discourse can be described as locally simple models, for example, business models of enterprises and companies. Up to now, research in human reasoning automation has been mainly concentrated around the most intellectually intensive activities, such as automated theorem proving. On the other hand, the retailer business model is formed from ”jobs”, and each ”job” can be modelled and automated more or less easily. At the same time, the whole retailer model as an integrated system is extremely complex. In this paper, we offer a variant of the mathematical definition of a locally simple model. This definition is intended for modelling a wide range of domains. Therefore, we also must take into account the perceptual and psychological issues. Logic is elitist, and if we want to attract to our models as many people as possible, we need to hide this elitism behind some metaphor, to which ’ordinary’ people are accustomed. As such a metaphor, we use the concept of a document, so our locally simple models are called document models. Document models are built in the paradigm of semantic programming. This allows us to achieve another important goal - to make the documentary models executable. Executable models are models that can act as practical information systems in the described domain of discourse. Thus, if our model is executable, then programming becomes redundant. The direct use of a model, instead of its programming coding, brings important advantages, for example, a drastic cost reduction for development and maintenance. Moreover, since the model is well and sound, and not dissolved within programming modules, we can directly apply AI tools, in particular, machine learning. This significantly expands the possibilities for automation and

  16. Avoid violence, rioting, and outrage; approach celebration, delight, and strength: Using large text corpora to compute valence, arousal, and the basic emotions.

    Science.gov (United States)

    Westbury, Chris; Keith, Jeff; Briesemeister, Benny B; Hofmann, Markus J; Jacobs, Arthur M

    2015-01-01

    Ever since Aristotle discussed the issue in Book II of his Rhetoric, humans have attempted to identify a set of "basic emotion labels". In this paper we propose an algorithmic method for evaluating sets of basic emotion labels that relies upon computed co-occurrence distances between words in a 12.7-billion-word corpus of unselected text from USENET discussion groups. Our method uses the relationship between human arousal and valence ratings collected for a large list of words, and the co-occurrence similarity between each word and emotion labels. We assess how well the words in each of 12 emotion label sets-proposed by various researchers over the past 118 years-predict the arousal and valence ratings on a test and validation dataset, each consisting of over 5970 items. We also assess how well these emotion labels predict lexical decision residuals (LDRTs), after co-varying out the effects attributable to basic lexical predictors. We then demonstrate a generalization of our method to determine the most predictive "basic" emotion labels from among all of the putative models of basic emotion that we considered. As well as contributing empirical data towards the development of a more rigorous definition of basic emotions, our method makes it possible to derive principled computational estimates of emotionality-specifically, of arousal and valence-for all words in the language.

  17. The structure of Diagnostic and Statistical Manual of Mental Disorders (4th edition, text revision) personality disorder symptoms in a large national sample.

    Science.gov (United States)

    Trull, Timothy J; Vergés, Alvaro; Wood, Phillip K; Jahng, Seungmin; Sher, Kenneth J

    2012-10-01

    We examined the latent structure underlying the criteria for DSM-IV-TR (American Psychiatric Association, 2000, Diagnostic and statistical manual of mental disorders (4th ed., text revision). Washington, DC: Author.) personality disorders in a large nationally representative sample of U.S. adults. Personality disorder symptom data were collected using a structured diagnostic interview from approximately 35,000 adults assessed over two waves of data collection in the National Epidemiologic Survey on Alcohol and Related Conditions. Our analyses suggested that a seven-factor solution provided the best fit for the data, and these factors were marked primarily by one or at most two personality disorder criteria sets. A series of regression analyses that used external validators tapping Axis I psychopathology, treatment for mental health problems, functioning scores, interpersonal conflict, and suicidal ideation and behavior provided support for the seven-factor solution. We discuss these findings in the context of previous studies that have examined the structure underlying the personality disorder criteria as well as the current proposals for DSM-5 personality disorders. (PsycINFO Database Record (c) 2012 APA, all rights reserved).

  18. Vietnamese Document Representation and Classification

    Science.gov (United States)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  19. Utah Text Retrieval Project

    Energy Technology Data Exchange (ETDEWEB)

    Hollaar, L A

    1983-10-01

    The Utah Text Retrieval project seeks well-engineered solutions to the implementation of large, inexpensive, rapid text information retrieval systems. The project has three major components. Perhaps the best known is the work on the specialized processors, particularly search engines, necessary to achieve the desired performance and cost. The other two concern the user interface to the system and the system's internal structure. The work on user interface development is not only concentrating on the syntax and semantics of the query language, but also on the overall environment the system presents to the user. Environmental enhancements include convenient ways to browse through retrieved documents, access to other information retrieval systems through gateways supporting a common command interface, and interfaces to word processing systems. The system's internal structure is based on a high-level data communications protocol linking the user interface, index processor, search processor, and other system modules. This allows them to be easily distributed in a multi- or specialized-processor configuration. It also allows new modules, such as a knowledge-based query reformulator, to be added. 15 references.

  20. Proxima: a presentation-oriented editor for structured documents

    NARCIS (Netherlands)

    Schrage, M.M.

    2004-01-01

    A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences

  1. K-BREF(Korean BAT reference document) development : BAT and BAT-AELs for large combustion plants and waste incinerators in Korea.

    Science.gov (United States)

    Yoo, Heungmin; Lee, Daegyun; Park, Jaehong

    2017-04-01

    Since the initial environmental policy namely "Regulation on assigning license for environmental pollutant emission facilities" was introduced in 1971, the previous environmental policy that assign licenses on emission facilities of each pollutant has been implementing in Korea. From this, economic standard and environmental quality of Korea are recognized as level of developed countries, even though various development activities for industrialization. However, amount of pollutant, emission route and emission source are increasing with development of various industries, and citizens recognition for environment have been changed as well. Thus, ministry of environment of Korea needs systematic policy based on scientific grounds for conversion of paradigm. For this, ministry of environment was introduced new policy namely "integrated pollution prevention and control(IPPC)", and it will be implemented from 2017 in Korea. IPPC is established for considering environment, economic and efficiency: 10 licenses on each pollutant emission will be integrated to one license, and it can be expected simplification for business licensing process. As well, this policy can be upgraded and processed while considering characteristics on location and industry types, in the future. However, to conduct this system harmoniously, policy demanders have to apply integrated control system to their facilities. Especially, the first applied industries by IPPC are two industries, such as large combustion plants for power generation and waste incineration facilities. Therefore, ministry of environment has to publish technical guideline books firstly for those industries, and they were named to "BAT reference document(BREF)". In this study, essential information for BREFs publishment, that is including emission levels, best available technique(BAT) and so on was investigated. In addition, the BAT-associated emission levels (BAT-AELs) of each industry were set using emission data obtained from

  2. Standardization Documents

    Science.gov (United States)

    2011-08-01

    Specifications and Standards; Guide Specifications; CIDs; and NGSs . Learn. Perform. Succeed. STANDARDIZATION DOCUMENTS Federal Specifications Commercial...national or international standardization document developed by a private sector association, organization, or technical society that plans ...Maintain lessons learned • Examples: Guidance for application of a technology; Lists of options Learn. Perform. Succeed. DEFENSE HANDBOOK

  3. Large multi-centre pilot randomized controlled trial testing a low-cost, tailored, self-help smoking cessation text message intervention for pregnant smokers (MiQuit).

    Science.gov (United States)

    Naughton, Felix; Cooper, Sue; Foster, Katharine; Emery, Joanne; Leonardi-Bee, Jo; Sutton, Stephen; Jones, Matthew; Ussher, Michael; Whitemore, Rachel; Leighton, Matthew; Montgomery, Alan; Parrott, Steve; Coleman, Tim

    2017-07-01

    To estimate the effectiveness of pregnancy smoking cessation support delivered by short message service (SMS) text message and key parameters needed to plan a definitive trial. Multi-centre, parallel-group, single-blinded, individual randomized controlled trial. Sixteen antenatal clinics in England. Four hundred and seven participants were randomized to the intervention (n = 203) or usual care (n = 204). Eligible women were 5 pre-pregnancy), were able to receive and understand English SMS texts and were not already using text-based cessation support. All participants received a smoking cessation leaflet; intervention participants also received a 12-week programme of individually tailored, automated, interactive, self-help smoking cessation text messages (MiQuit). Seven smoking outcomes, including validated continuous abstinence from 4 weeks post-randomization until 36 weeks gestation, design parameters for a future trial and cost-per-quitter. Using the validated, continuous abstinence outcome, 5.4% (11 of 203) of MiQuit participants were abstinent versus 2.0% (four of 204) of usual care participants [odds ratio (OR) = 2.7, 95% confidence interval (CI) = 0.93-9.35]. The Bayes factor for this outcome was 2.23. Completeness of follow-up at 36 weeks gestation was similar in both groups; provision of self-report smoking data was 64% (MiQuit) and 65% (usual care) and abstinence validation rates were 56% (MiQuit) and 61% (usual care). The incremental cost-per-quitter was £133.53 (95% CI = -£395.78 to 843.62). There was some evidence, although not conclusive, that a text-messaging programme may increase cessation rates in pregnant smokers when provided alongside routine NHS cessation care. © 2017 The Authors. Addiction published by John Wiley & Sons Ltd on behalf of Society for the Study of Addiction.

  4. Maury Documentation

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Supporting documentation for the Maury Collection of marine observations. Includes explanations from Maury himself, as well as guides and descriptions by the U.S....

  5. Documentation Service

    International Nuclear Information System (INIS)

    Charnay, J.; Chosson, L.; Croize, M.; Ducloux, A.; Flores, S.; Jarroux, D.; Melka, J.; Morgue, D.; Mottin, C.

    1998-01-01

    This service assures the treatment and diffusion of the scientific information and the management of the scientific production of the institute as well as the secretariat operation for the groups and services of the institute. The report on documentation-library section mentions: the management of the documentation funds, search in international databases (INIS, Current Contents, Inspects), Pret-Inter service which allows accessing documents through DEMOCRITE network of IN2P3. As realizations also mentioned are: the setup of a video, photo database, the Web home page of the institute's library, follow-up of digitizing the document funds by integrating the CD-ROMs and diskettes, electronic archiving of the scientific production, etc

  6. Computerising documentation

    International Nuclear Information System (INIS)

    Anon.

    1992-01-01

    The nuclear power generation industry is faced with public concern and government pressures over safety, efficiency and risk. Operators throughout the industry are addressing these issues with the aid of a new technology - technical document management systems (TDMS). Used for strategic and tactical advantage, the systems enable users to scan, archive, retrieve, store, edit, distribute worldwide and manage the huge volume of documentation (paper drawings, CAD data and film-based information) generated in building, maintaining and ensuring safety in the UK's power plants. The power generation industry has recognized that the management and modification of operation critical information is vital to the safety and efficiency of its power plants. Regulatory pressure from the Nuclear Installations Inspectorate (NII) to operate within strict safety margins or lose Site Licences has prompted the need for accurate, up-to-data documentation. A document capture and management retrieval system provides a powerful cost-effective solution, giving rapid access to documentation in a tightly controlled environment. The computerisation of documents and plans is discussed in this article. (Author)

  7. Document retrieval on repetitive string collections.

    Science.gov (United States)

    Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J; Sirén, Jouni

    2017-01-01

    Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists , that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top- k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.

  8. Informational system. Documents management

    Directory of Open Access Journals (Sweden)

    Vladut Iacob

    2009-12-01

    Full Text Available Productivity growing, as well as reducing of operational costs in a company can be achieved by adopting a document management solutions. Such application will allow management and structured and efficient transmission of information within the organization.

  9. Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

    Science.gov (United States)

    White, Sheida

    2012-01-01

    This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…

  10. Document image analysis: A primer

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    (1) Typical documents in today's office are computer-generated, but even so, inevitably by different computers and ... different sizes, from a business card to a large engineering drawing. Document analysis ... Whether global or adaptive ...

  11. Text Induced Spelling Correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word

  12. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  13. The text of the amended Protocol to the Agreement between the Kingdom of Swaziland and the International Atomic Energy Agency for the Application of Safeguards in Connection with the Treaty on the Non-Proliferation of Nuclear Weapons, is reproduced in this document for the information of all Member States of the Agency

    International Nuclear Information System (INIS)

    2010-01-01

    The text of the amended Protocol to the Agreement between the Kingdom of Swaziland and the International Atomic Energy Agency for the Application of Safeguards in Connection with the Treaty on the Non-Proliferation of Nuclear Weapons, is reproduced in this document for the information of all Member States of the Agency [es

  14. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted.   CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat a...

  15. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the natu...

  16. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the natur...

  17. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ Management- CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. Management - CB - MB - FB Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2007 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the nature of employment and ...

  18. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ Management- CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. Management - CB - MB - FB Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2007 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the nature of em¬pl...

  19. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the na...

  20. Inductive inference for large scale text classification

    OpenAIRE

    Silva, Catarina Helena Branco Simões da

    2009-01-01

    Tese de doutoramento em Engenharia Informática apresentada à Fac. de Ciências e Tecnologia da Univ. de Coimbra Nas últimas décadas a disponibilidade e importância dos textos em formato digital tem vindo a aumentar exponencialmente, encontrando-se neste momento presentes em quase todos os aspectos da vida moderna. A classificação de textos é deste modo uma área activa de investigação, justificada por muitas aplicações reais. Ainda assim, lidar com a sobrecarga de textos em formato digital e...

  1. Are PDF Documents Accessible?

    Directory of Open Access Journals (Sweden)

    Mireia Ribera Turró

    2008-09-01

    Full Text Available Adobe PDF is one of the most widely used formats in scientific communications and in administrative documents. In its latest versions it has incorporated structural tags and improvements that increase its level of accessibility. This article reviews the concept of accessibility in the reading of digital documents and evaluates the accessibility of PDF according to the most widely established standards.

  2. Helios: Understanding Solar Evolution Through Text Analytics

    Energy Technology Data Exchange (ETDEWEB)

    Randazzese, Lucien [SRI International, Menlo Park, CA (United States)

    2016-12-02

    This proof-of-concept project focused on developing, testing, and validating a range of bibliometric, text analytic, and machine-learning based methods to explore the evolution of three photovoltaic (PV) technologies: Cadmium Telluride (CdTe), Dye-Sensitized solar cells (DSSC), and Multi-junction solar cells. The analytical approach to the work was inspired by previous work by the same team to measure and predict the scientific prominence of terms and entities within specific research domains. The goal was to create tools that could assist domain-knowledgeable analysts in investigating the history and path of technological developments in general, with a focus on analyzing step-function changes in performance, or “breakthroughs,” in particular. The text-analytics platform developed during this project was dubbed Helios. The project relied on computational methods for analyzing large corpora of technical documents. For this project we ingested technical documents from the following sources into Helios: Thomson Scientific Web of Science (papers), the U.S. Patent & Trademark Office (patents), the U.S. Department of Energy (technical documents), the U.S. National Science Foundation (project funding summaries), and a hand curated set of full-text documents from Thomson Scientific and other sources.

  3. Segmentation of complex document

    Directory of Open Access Journals (Sweden)

    Souad Oudjemia

    2014-06-01

    Full Text Available In this paper we present a method for segmentation of documents image with complex structure. This technique based on GLCM (Grey Level Co-occurrence Matrix used to segment this type of document in three regions namely, 'graphics', 'background' and 'text'. Very briefly, this method is to divide the document image, in block size chosen after a series of tests and then applying the co-occurrence matrix to each block in order to extract five textural parameters which are energy, entropy, the sum entropy, difference entropy and standard deviation. These parameters are then used to classify the image into three regions using the k-means algorithm; the last step of segmentation is obtained by grouping connected pixels. Two performance measurements are performed for both graphics and text zones; we have obtained a classification rate of 98.3% and a Misclassification rate of 1.79%.

  4. Orbitmpi Documentation

    International Nuclear Information System (INIS)

    Lowe, Lisa L.

    2000-01-01

    Orbitmpi is a parallelized version of Roscoe White's Orbit code. The code has been parallelized using MPI, which makes it portable to many types of machines. The guidelines used for the parallelization were to increase code performance with minimal changes to the code's original structure. This document gives a general description of how the parallel sections of the code run. It discusses the changes made to the original code and comments on the general procedure for future additions to Orbitmpi, as well as describing the effects of a parallelized random number generator on the code's output. Finally, the scaling results from Hecate and from Puffin are presented. Hecate is a 64-processor Origin 2000 machine, with MIPS R12000 processors and 16GB of memory, and Puffin is a PC cluster with 9 dual-processor 450 MHz Pentium III (18 processors max.), with 100Mbits ethernet communication

  5. CNEA's quality system documentation

    International Nuclear Information System (INIS)

    Mazzini, M.M.; Garonis, O.H.

    1998-01-01

    Full text: To obtain an effective and coherent documentation system suitable for CNEA's Quality Management Program, we decided to organize the CNEA's quality documentation with : a- Level 1. Quality manual. b- Level 2. Procedures. c-Level 3. Qualities plans. d- Level 4: Instructions. e- Level 5. Records and other documents. The objective of this work is to present a standardization of the documentation of the CNEA's quality system of facilities, laboratories, services, and R and D activities. Considering the diversity of criteria and formats for elaboration the documentation by different departments, and since ultimately each of them generally includes the same quality management policy, we proposed the elaboration of a system in order to improve the documentation, avoiding unnecessary time wasting and costs. This will aloud each sector to focus on their specific documentation. The quality manuals of the atomic centers fulfill the rule 3.6.1 of the Nuclear Regulatory Authority, and the Safety Series 50-C/SG-Q of the International Atomic Energy Agency. They are designed by groups of competent and highly trained people of different departments. The normative procedures are elaborated with the same methodology as the quality manuals. The quality plans which describe the organizational structure of working group and the appropriate documentation, will asses the quality manuals of facilities, laboratories, services, and research and development activities of atomic centers. The responsibilities for approval of the normative documentation are assigned to the management in charge of the administration of economic and human resources in order to fulfill the institutional objectives. Another improvement aimed to eliminate unnecessary invaluable processes is the inclusion of all quality system's normative documentation in the CNEA intranet. (author) [es

  6. Applications for electronic documents

    International Nuclear Information System (INIS)

    Beitel, G.A.

    1995-01-01

    This paper discusses the application of electronic media to documents, specifically Safety Analysis Reports (SARs), prepared for Environmental Restoration and Waste Management (ER ampersand WM) programs being conducted for the Department of Energy (DOE) at the Idaho National Engineering Laboratory (INEL). Efforts are underway to upgrade our document system using electronic format. To satisfy external requirements (DOE, State, and Federal), ER ampersand WM programs generate a complement of internal requirements documents including a SAR and Technical Safety Requirements along with procedures and training materials. Of interest, is the volume of information and the difficulty in handling it. A recently prepared ER ampersand WM SAR consists of 1,000 pages of text and graphics; supporting references add 10,000 pages. Other programmatic requirements documents consist of an estimated 5,000 pages plus references

  7. Strategy as Texts

    DEFF Research Database (Denmark)

    Obed Madsen, Søren

    of the strategy into four categories. Second, the managers produce new texts based on the original strategy document by using four different ways of translation models. The study’s findings contribute to three areas. Firstly, it shows that translation is more than a sociological process. It is also...... a craftsmanship that requires knowledge and skills, which unfortunately seems to be overlooked in both the literature and in practice. Secondly, it shows that even though a strategy text is in singular, the translation makes strategy plural. Thirdly, the article proposes a way to open up the black box of what......This article shows empirically how managers translate a strategy plan at an individual level. By analysing how managers in three organizations translate strategies, it identifies that the translation happens in two steps: First, the managers decipher the strategy by coding the different parts...

  8. Texto documental

    Directory of Open Access Journals (Sweden)

    Elkin Gómez

    2000-09-01

    Full Text Available Borges enamorado. (Ensayos críticos. Diálogos con Borges. Rescate y glosa de textos de Borges y sobre Borges. Juan Gustavo Cobo Borda. Instituto Caro y Cuervo, La Granada Entreabierta, Bogotá, 1999, 397 págs.

  9. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the ICMS Web site. The following items can be found on: http://cms.cern.ch/iCMS Management – CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. Management – CB – MB – FB Agendas and minutes are accessible to CMS members through Indico. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2008 Annual Reviews are posted in Indico. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral student upon completion of their theses.  Therefore it is requested that Ph.D students inform the CMS Secretariat about the nature of employment and name of their first employer. The Notes, Conference Reports and Theses published si...

  10. Use of electronic healthcare records in large-scale simple randomized trials at the point of care for the documentation of value-based medicine.

    Science.gov (United States)

    van Staa, T-P; Klungel, O; Smeeth, L

    2014-06-01

    A solid foundation of evidence of the effects of an intervention is a prerequisite of evidence-based medicine. The best source of such evidence is considered to be randomized trials, which are able to avoid confounding. However, they may not always estimate effectiveness in clinical practice. Databases that collate anonymized electronic health records (EHRs) from different clinical centres have been widely used for many years in observational studies. Randomized point-of-care trials have been initiated recently to recruit and follow patients using the data from EHR databases. In this review, we describe how EHR databases can be used for conducting large-scale simple trials and discuss the advantages and disadvantages of their use. © 2014 The Association for the Publication of the Journal of Internal Medicine.

  11. Sedimentary records of trace elements from large European lakes (Switzerland) document historic to recent freshwater pollution and climate-induced runoff variations

    Science.gov (United States)

    Thevenon, F.; Wirth, S. B.; Fujak, M.; Poté, J.; Thierry, A.; Chiaradia, M.; Girardclos, S.

    2011-12-01

    Continuous sedimentary records of anthropogenic and natural trace elements determined by ICPMS, from 5 large and deep perialpine lakes from Central Europe (Switzerland), evidence the environmental impacts of industrial fossil fuel pollution. In fact, the greatest increase in heavy metal pollution was registered at all the studied sites following the European industrial revolution of ca. AD 1800; with the highest values during the middle part of the 20th century. On a regional scale, anthropogenic heavy metal input subsequently stopped increasing thanks to remediation strategies such as the implementation of wastewater treatment plants (WWTPs). On the other hand, the discharge of industrial treated wastewaters into Vidy Bay of Lake Geneva during the second part of the 20th century involved the sedimentation of highly contaminated sediments in the area surrounding the WWTP outlet pipe discharge; less than 4 km from the main supply of drinking water of Lausanne (127'000 hab.). Microbial analyses furthermore reveal i) high increase in bacterial densities following the lake eutrophication in the 1970s, and that ii) the related sediments can be considered as a reservoir of antibiotic resistant bacteria/genes (of human origin). We finally compare instrumental hydrological data over the last century with variations of lithogenic trace elements (e.g., titanium) as registered in three large lakes (Brienz, Thun and Bienne) connected by the River Aar. This task allows to better constraining the runoff variations on a regional scale over the last decades for the the River Aar, and its possible increase under warming climate conditions in the European Alps.

  12. Vocabulary Constraint on Texts

    Directory of Open Access Journals (Sweden)

    C. Sutarsyah

    2008-01-01

    Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.  It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.

  13. Omega documentation

    Energy Technology Data Exchange (ETDEWEB)

    Howerton, R.J.; Dye, R.E.; Giles, P.C.; Kimlinger, J.R.; Perkins, S.T.; Plechaty, E.F.

    1983-08-01

    OMEGA is a CRAY I computer program that controls nine codes used by LLNL Physical Data Group for: 1) updating the libraries of evaluated data maintained by the group (UPDATE); 2) calculating average values of energy deposited in secondary particles and residual nuclei (ENDEP); 3) checking the libraries for internal consistency, especially for energy conservation (GAMCHK); 4) producing listings, indexes and plots of the library data (UTILITY); 5) producing calculational constants such as group averaged cross sections and transfer matrices for diffusion and Sn transport codes (CLYDE); 6) producing and updating standard files of the calculational constants used by LLNL Sn and diffusion transport codes (NDFL); 7) producing calculational constants for Monte Carlo transport codes that use group-averaged cross sections and continuous energy for particles (CTART); 8) producing and updating standard files used by the LLNL Monte Carlo transport codes (TRTL); and 9) producing standard files used by the LANL pointwise Monte Carlo transport code MCNP (MCPOINT). The first four of these functions and codes deal with the libraries of evaluated data and the last five with various aspects of producing calculational constants for use by transport codes. In 1970 a series, called PD memos, of internal and informal memoranda was begun. These were intended to be circulated among the group for comment and then to provide documentation for later reference whenever questions arose about the subject matter of the memos. They have served this purpose and now will be drawn upon as source material for this more comprehensive report that deals with most of the matters covered in those memos.

  14. Omega documentation

    International Nuclear Information System (INIS)

    Howerton, R.J.; Dye, R.E.; Giles, P.C.; Kimlinger, J.R.; Perkins, S.T.; Plechaty, E.F.

    1983-08-01

    OMEGA is a CRAY I computer program that controls nine codes used by LLNL Physical Data Group for: 1) updating the libraries of evaluated data maintained by the group (UPDATE); 2) calculating average values of energy deposited in secondary particles and residual nuclei (ENDEP); 3) checking the libraries for internal consistency, especially for energy conservation (GAMCHK); 4) producing listings, indexes and plots of the library data (UTILITY); 5) producing calculational constants such as group averaged cross sections and transfer matrices for diffusion and Sn transport codes (CLYDE); 6) producing and updating standard files of the calculational constants used by LLNL Sn and diffusion transport codes (NDFL); 7) producing calculational constants for Monte Carlo transport codes that use group-averaged cross sections and continuous energy for particles (CTART); 8) producing and updating standard files used by the LLNL Monte Carlo transport codes (TRTL); and 9) producing standard files used by the LANL pointwise Monte Carlo transport code MCNP (MCPOINT). The first four of these functions and codes deal with the libraries of evaluated data and the last five with various aspects of producing calculational constants for use by transport codes. In 1970 a series, called PD memos, of internal and informal memoranda was begun. These were intended to be circulated among the group for comment and then to provide documentation for later reference whenever questions arose about the subject matter of the memos. They have served this purpose and now will be drawn upon as source material for this more comprehensive report that deals with most of the matters covered in those memos

  15. Indian Language Document Analysis and Understanding

    Indian Academy of Sciences (India)

    documents would contain text of more than one script (for example, English, Hindi and the ... O'Gorman and Govindaraju provides a good overview on document image ... word level in bilingual documents containing Roman and Tamil scripts.

  16. Documenting human transformation and establishing the reference condition of large river systems using Corona images: a case study from the Ganga River basin, India

    Science.gov (United States)

    Sinha, Rajiv; Pipil, Shobhit; Carbonneau, Patrice; Galiatsatos, Nikolaos

    2016-04-01

    The Ganga basin in northern India is one of the most populous river basin in the world with nearly half a billion inhabitants. In the post-independence era, population expansion and human interventions have left the ecosystem of the Ganga in a severely damaged state with dwindling water levels, pollution due to human activity and natural sediment transport severely perturbed by dams and barrages. Fortunately, there is a growing recognition by the policy managers in India that the restoration of the Ganga to a healthier status, closer to its original unperturbed state, would set a strong foundation to future, greener, economic growth in Northern India. However, given the past six decades of fast development, efforts to restore the Ganga to its original condition are faced with a fundamental question: What was the original state of the Ganga? Answering this question will require some knowledge of the former course of the Ganga and of the farming and urban density of the surrounding plains before the impacts of human disturbance could be felt. We have made use of the Corona spy satellite program that collected a large number of earth observation photos in the 1960s. These photos, now declassified, offer us a unique view of the Ganga at the very early stages of intense development and thus before the worst ecological damages occurred. However, actual usage of these images poses significant technical challenges. In the design of the Corona cameras, very high resolution comes at the cost of complex distortions. Furthermore, we have no information on the exact position and orientation of the satellite at the time of image acquisition so an accurate reprojection of the image into conventional map coordinates is not straightforward. We have developed a georectification process based on polynomial transformation to achieve a positional accuracy of ±20m for the area of our interest. Further, We have developed an object-based classification method that uses both texture and

  17. Improving collaborative documentation in CMS

    International Nuclear Information System (INIS)

    Lassila-Perini, Kati; Salmi, Leena

    2010-01-01

    Complete and up-to-date documentation is essential for efficient data analysis in a large and complex collaboration like CMS. Good documentation reduces the time spent in problem solving for users and software developers. The scientists in our research environment do not necessarily have the interests or skills of professional technical writers. This results in inconsistencies in the documentation. To improve the quality, we have started a multidisciplinary project involving CMS user support and expertise in technical communication from the University of Turku, Finland. In this paper, we present possible approaches to study the usability of the documentation, for instance, usability tests conducted recently for the CMS software and computing user documentation.

  18. Directed Activities Related to Text: Text Analysis and Text Reconstruction.

    Science.gov (United States)

    Davies, Florence; Greene, Terry

    This paper describes Directed Activities Related to Text (DART), procedures that were developed and are used in the Reading for Learning Project at the University of Nottingham (England) to enhance learning from texts and that fall into two broad categories: (1) text analysis procedures, which require students to engage in some form of analysis of…

  19. Securing XML Documents

    Directory of Open Access Journals (Sweden)

    Charles Shoniregun

    2004-11-01

    Full Text Available XML (extensible markup language is becoming the current standard for establishing interoperability on the Web. XML data are self-descriptive and syntax-extensible; this makes it very suitable for representation and exchange of semi-structured data, and allows users to define new elements for their specific applications. As a result, the number of documents incorporating this standard is continuously increasing over the Web. The processing of XML documents may require a traversal of all document structure and therefore, the cost could be very high. A strong demand for a means of efficient and effective XML processing has posed a new challenge for the database world. This paper discusses a fast and efficient indexing technique for XML documents, and introduces the XML graph numbering scheme. It can be used for indexing and securing graph structure of XML documents. This technique provides an efficient method to speed up XML data processing. Furthermore, the paper explores the classification of existing methods impact of query processing, and indexing.

  20. Changing Landscapes in Documentation Efforts: Civil Society Documentation of Serious Human Rights Violations

    Directory of Open Access Journals (Sweden)

    Brianne McGonigle Leyh

    2017-04-01

    Full Text Available Wittingly or unwittingly, civil society actors have long been faced with the task of documenting serious human rights violations. Thirty years ago, such efforts were largely organised by grassroots movements, often with little support or funding from international actors. Sharing information and best practices was difficult. Today that situation has significantly changed. The purpose of this article is to explore the changing landscape of civil society documentation of serious human rights violations, and what that means for standardising and professionalising documentation efforts. Using the recent Hisséne Habré case as an example, this article begins by looking at how civil society documentation can successfully influence an accountability process. Next, the article touches upon barriers that continue to impede greater documentation efforts. The article examines the changing landscape of documentation, focusing on technological changes and the rise of citizen journalism and unofficial investigations, using Syria as an example, as well as on the increasing support for documentation efforts both in Syria and worldwide. The changing landscape has resulted in the proliferation of international documentation initiatives aimed at providing local civil society actors guidelines and practical assistance on how to recognise, collect, manage, store and use information about serious human rights violations, as well as on how to minimise the risks associated with the documentation of human rights violations. The recent initiatives undertaken by international civil society, including those by the Public International Law & Policy Group, play an important role in helping to standardise and professionalise documentation work and promote the foundational principles of documentation, namely the ‘do no harm’ principle, and the principles of informed consent and confidentiality. Recognising the drawback that greater professionalisation may bring, it

  1. Storing XML Documents in Databases

    NARCIS (Netherlands)

    A.R. Schmidt; S. Manegold (Stefan); M.L. Kersten (Martin); L.C. Rivero; J.H. Doorn; V.E. Ferraggine

    2005-01-01

    textabstractThe authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML

  2. A Customizable Text Classifier for Text Mining

    Directory of Open Access Journals (Sweden)

    Yun-liang Zhang

    2007-12-01

    Full Text Available Text mining deals with complex and unstructured texts. Usually a particular collection of texts that is specified to one or more domains is necessary. We have developed a customizable text classifier for users to mine the collection automatically. It derives from the sentence category of the HNC theory and corresponding techniques. It can start with a few texts, and it can adjust automatically or be adjusted by user. The user can also control the number of domains chosen and decide the standard with which to choose the texts based on demand and abundance of materials. The performance of the classifier varies with the user's choice.

  3. Interconnectedness und digitale Texte

    Directory of Open Access Journals (Sweden)

    Detlev Doherr

    2013-04-01

    Full Text Available Zusammenfassung Die multimedialen Informationsdienste im Internet werden immer umfangreicher und umfassender, wobei auch die nur in gedruckter Form vorliegenden Dokumente von den Bibliotheken digitalisiert und ins Netz gestellt werden. Über Online-Dokumentenverwaltungen oder Suchmaschinen können diese Dokumente gefunden und dann in gängigen Formaten wie z.B. PDF bereitgestellt werden. Dieser Artikel beleuchtet die Funktionsweise der Humboldt Digital Library, die seit mehr als zehn Jahren Dokumente von Alexander von Humboldt in englischer Übersetzung im Web als HDL (Humboldt Digital Library kostenfrei zur Verfügung stellt. Anders als eine digitale Bibliothek werden dabei allerdings nicht nur digitalisierte Dokumente als Scan oder PDF bereitgestellt, sondern der Text als solcher und in vernetzter Form verfügbar gemacht. Das System gleicht damit eher einem Informationssystem als einer digitalen Bibliothek, was sich auch in den verfügbaren Funktionen zur Auffindung von Texten in unterschiedlichen Versionen und Übersetzungen, Vergleichen von Absätzen verschiedener Dokumente oder der Darstellung von Bilden in ihrem Kontext widerspiegelt. Die Entwicklung von dynamischen Hyperlinks auf der Basis der einzelnen Textabsätze der Humboldt‘schen Werke in Form von Media Assets ermöglicht eine Nutzung der Programmierschnittstelle von Google Maps zur geographischen wie auch textinhaltlichen Navigation. Über den Service einer digitalen Bibliothek hinausgehend, bietet die HDL den Prototypen eines mehrdimensionalen Informationssystems, das mit dynamischen Strukturen arbeitet und umfangreiche thematische Auswertungen und Vergleiche ermöglicht. Summary The multimedia information services on Internet are becoming more and more comprehensive, even the printed documents are digitized and republished as digital Web documents by the libraries. Those digital files can be found by search engines or management tools and provided as files in usual formats as

  4. Electronic Braille Document Reader

    OpenAIRE

    Arif, Shahab; Holmes, Violeta

    2013-01-01

    This paper presents an investigation into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blin...

  5. Electronic Braille Document Reader

    OpenAIRE

    Arif, S.

    2012-01-01

    An investigation was conducted into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blind and ...

  6. La Documentation photographique

    Directory of Open Access Journals (Sweden)

    Magali Hamm

    2009-03-01

    Full Text Available La Documentation photographique, revue destinée aux enseignants et étudiants en histoire-géographie, place l’image au cœur de sa ligne éditoriale. Afin de suivre les évolutions actuelles de la géographie, la collection propose une iconographie de plus en plus diversifiée : cartes, photographies, mais aussi caricatures, une de journal ou publicité, toutes étant considérées comme un document géographique à part entière. Car l’image peut se faire synthèse ; elle peut au contraire montrer les différentes facettes d’un objet ; souvent elle permet d’incarner des phénomènes géographiques. Associées à d’autres documents, les images aident les enseignants à initier leurs élèves à des raisonnements géographiques complexes. Mais pour apprendre à les lire, il est fondamental de les contextualiser, de les commenter et d’interroger leur rapport au réel.The Documentation photographique, magazine dedicated to teachers and students in History - Geography, places the image at the heart of its editorial line. In order to follow the evolutions of Geography, the collection presents a more and more diversified iconography: maps, photographs, but also drawings or advertisements, all this documents being considered as geographical ones. Because image can be a synthesis; on the contrary it can present the different facets of a same object; often it enables to portray geographical phenomena. Related to other documents, images assist the teachers in the students’ initiation to complex geographical reasoning. But in order to learn how to read them, it is fundamental to contextualize them, comment them and question their relations with reality.

  7. FTP: Full-Text Publishing?

    Science.gov (United States)

    Jul, Erik

    1992-01-01

    Describes the use of file transfer protocol (FTP) on the INTERNET computer network and considers its use as an electronic publishing system. The differing electronic formats of text files are discussed; the preparation and access of documents are described; and problems are addressed, including a lack of consistency. (LRW)

  8. Nuclear power plants documentation system

    International Nuclear Information System (INIS)

    Schwartz, E.L.

    1991-01-01

    Since the amount of documents (type and quantity) necessary for the entire design of a NPP is very large, this implies that an overall and detailed identification, filling and retrieval system shall be implemented. This is even more applicable to the FINAL QUALITY DOCUMENTATION of the plant, as stipulated by IAEA Safety Codes and related guides. For such a purpose it was developed a DOCUMENTATION MANUAL, which describes in detail the before mentioned documentation system. Here we present the expected goals and results which we have to reach for Angra 2 and 3 Project. (author)

  9. Using color management in color document processing

    Science.gov (United States)

    Nehab, Smadar

    1995-04-01

    Color Management Systems have been used for several years in Desktop Publishing (DTP) environments. While this development hasn't matured yet, we are already experiencing the next generation of the color imaging revolution-Device Independent Color for the small office/home office (SOHO) environment. Though there are still open technical issues with device independent color matching, they are not the focal point of this paper. This paper discusses two new and crucial aspects in using color management in color document processing: the management of color objects and their associated color rendering methods; a proposal for a precedence order and handshaking protocol among the various software components involved in color document processing. As color peripherals become affordable to the SOHO market, color management also becomes a prerequisite for common document authoring applications such as word processors. The first color management solutions were oriented towards DTP environments whose requirements were largely different. For example, DTP documents are image-centric, as opposed to SOHO documents that are text and charts centric. To achieve optimal reproduction on low-cost SOHO peripherals, it is critical that different color rendering methods are used for the different document object types. The first challenge in using color management of color document processing is the association of rendering methods with object types. As a result of an evolutionary process, color matching solutions are now available as application software, as driver embedded software and as operating system extensions. Consequently, document processing faces a new challenge, the correct selection of the color matching solution while avoiding duplicate color corrections.

  10. The Effects of Tabular-Based Content Extraction on Patent Document Clustering

    Directory of Open Access Journals (Sweden)

    Michael W. Berry

    2012-10-01

    Full Text Available Data can be represented in many different ways within a particular document or set of documents. Hence, attempts to automatically process the relationships between documents or determine the relevance of certain document objects can be problematic. In this study, we have developed software to automatically catalog objects contained in HTML files for patents granted by the United States Patent and Trademark Office (USPTO. Once these objects are recognized, the software creates metadata that assigns a data type to each document object. Such metadata can be easily processed and analyzed for subsequent text mining tasks. Specifically, document similarity and clustering techniques were applied to a subset of the USPTO document collection. Although our preliminary results demonstrate that tables and numerical data do not provide quantifiable value to a document’s content, the stage for future work in measuring the importance of document objects within a large corpus has been set.

  11. Areva - 2011 Reference document

    International Nuclear Information System (INIS)

    2011-01-01

    After having indicated the person responsible of this document and the legal account auditors, and provided some financial information, this document gives an overview of the different risk factors existing in the company: law risks, industrial and environmental risks, operational risks, risks related to large projects, market and liquidity risks. Then, after having recalled the history and evolution of the company and the evolution of its investments over the last five years, it proposes an overview of Areva's activities on the markets of nuclear energy and renewable energies, of its clients and suppliers, of its strategy, of the activities of its different departments. Other information are provided: company's flow chart, estate properties (plants, equipment), an analysis of its financial situation, its research and development policy, the present context, profit previsions or estimations, management organization and operation

  12. Text mining by Tsallis entropy

    Science.gov (United States)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  13. Pedoinformatics Approach to Soil Text Analytics

    Science.gov (United States)

    Furey, J.; Seiter, J.; Davis, A.

    2017-12-01

    The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.

  14. Text Maps: Helping Students Navigate Informational Texts.

    Science.gov (United States)

    Spencer, Brenda H.

    2003-01-01

    Notes that a text map is an instructional approach designed to help students gain fluency in reading content area materials. Discusses how the goal is to teach students about the important features of the material and how the maps can be used to build new understandings. Presents the procedures for preparing and using a text map. (SG)

  15. Supporting the education evidence portal via text mining

    Science.gov (United States)

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  16. SANSMIC design document.

    Energy Technology Data Exchange (ETDEWEB)

    Weber, Paula D. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Rudeen, David Keith [GRAM, Inc., Albuquerque, NM (United States)

    2015-07-01

    The United States Strategic Petroleum Reserve (SPR) maintains an underground storage system consisting of caverns that were leached or solution mined in four salt domes located near the Gulf of Mexico in Texas and Louisiana. The SPR comprises more than 60 active caverns containing approximately 700 million barrels of crude oil. Sandia National Labo- ratories (SNL) is the geotechnical advisor to the SPR. As the most pressing need at the inception of the SPR was to create and fill storage volume with oil, the decision was made to leach the caverns and fill them simultaneously (leach-fill). Therefore, A.J. Russo developed SANSMIC in the early 1980s which allows for a transient oil-brine interface (OBI) making it possible to model leach-fill and withdrawal operations. As the majority of caverns are currently filled to storage capacity, the primary uses of SANSMIC at this time are related to the effects of small and large withdrawals, expansion of existing caverns, and projecting future pillar to diameter ratios. SANSMIC was identified by SNL as a priority candidate for qualification. This report continues the quality assurance (QA) process by documenting the "as built" mathematical and numerical models that comprise this document. The pro- gram flow is outlined and the models are discussed in detail. Code features that were added later or were not documented previously have been expounded. No changes in the code's physics have occurred since the original documentation (Russo, 1981, 1983) although recent experiments may yield improvements to the temperature and plume methods in the future.

  17. Triangular clustering in document networks

    Energy Technology Data Exchange (ETDEWEB)

    Cheng Xueqi; Ren Fuxin [Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190 (China); Zhou Shi [Department of Computer Science, University College London, Malet Place, London WC1E 6BT (United Kingdom); Hu Maobin [School of Engineering Science, University of Science and Technology of China, Hefei 230026 (China)], E-mail: cxq@ict.ac.cn, E-mail: renfuxin@software.ict.ac.cn, E-mail: s.zhou@adastral.ucl.ac.uk, E-mail: humaobin@ustc.edu.cn

    2009-03-15

    Document networks have the characteristic that a document node, e.g. a webpage or an article, carries meaningful content. Properties of document networks are not only affected by topological connectivity between nodes, but are also strongly influenced by the semantic relation between the content of the nodes. We observed that document networks have a large number of triangles and a high value clustering coefficient. Also there is a strong correlation between the probability of formation of a triangle and the content similarity among the three nodes involved. We propose the degree-similarity product (DSP) model, which well reproduces these properties. The model achieves this by using a preferential attachment mechanism that favours the linkage between nodes that are both popular and similar. This work is a step forward towards a better understanding of the structure and evolution of document networks.

  18. Documentation of Cultural Heritage Objects

    Directory of Open Access Journals (Sweden)

    Jon Grobovšek

    2013-09-01

    Full Text Available EXTENDED ABSTRACT:The first and important phase of documentation of cultural heritage objects is to understand which objects need to be documented. The entire documentation process is determined by the characteristics and scope of the cultural heritage object. The next question to be considered is the expected outcome of the documentation process and the purpose for which it will be used. These two essential guidelines determine each stage of the documentation workflow: the choice of the most appropriate data capturing technology and data processing method, how detailed should the documentation be, what problems may occur, what the expected outcome is, what it will be used for, and the plan for storing data and results. Cultural heritage objects require diverse data capturing and data processing methods. It is important that even the first stages of raw data capturing are oriented towards the applicability of results. The selection of the appropriate working method can facilitate the data processing and the preparation of final documentation. Documentation of paintings requires different data capturing method than documentation of buildings or building areas. The purpose of documentation can also be the preservation of the contemporary cultural heritage to posterity or the basis for future projects and activities on threatened objects. Documentation procedures should be adapted to our needs and capabilities. Captured and unprocessed data are lost unless accompanied by additional analyses and interpretations. Information on tools, procedures and outcomes must be included into documentation. A thorough analysis of unprocessed but accessible documentation, if adequately stored and accompanied by additional information, enables us to gather useful data. In this way it is possible to upgrade the existing documentation and to avoid data duplication or unintentional misleading of users. The documentation should be archived safely and in a way to meet

  19. Text mining from ontology learning to automated text processing applications

    CERN Document Server

    Biemann, Chris

    2014-01-01

    This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects

  20. Storing XML Documents in Databases

    OpenAIRE

    Schmidt, A.R.; Manegold, Stefan; Kersten, Martin; Rivero, L.C.; Doorn, J.H.; Ferraggine, V.E.

    2005-01-01

    textabstractThe authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes ...

  1. Benchmarking infrastructure for mutation text mining.

    Science.gov (United States)

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  2. Benchmarking infrastructure for mutation text mining

    Science.gov (United States)

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  3. Generic safety documentation model

    International Nuclear Information System (INIS)

    Mahn, J.A.

    1994-04-01

    This document is intended to be a resource for preparers of safety documentation for Sandia National Laboratories, New Mexico facilities. It provides standardized discussions of some topics that are generic to most, if not all, Sandia/NM facilities safety documents. The material provides a ''core'' upon which to develop facility-specific safety documentation. The use of the information in this document will reduce the cost of safety document preparation and improve consistency of information

  4. Automatic text summarization

    CERN Document Server

    Torres Moreno, Juan Manuel

    2014-01-01

    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts.  The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been develop

  5. The Pelindaba text and its previous

    International Nuclear Information System (INIS)

    Adeniji, O.

    1996-01-01

    The main body of the Treaty, the preamble, articles 1-22, and the map are reproduced in this issue in the section ''Documentation Relating to Disarmament and International Security''. The complete text, including annexes and protocols, is contained in document A/50/426

  6. Text-Fabric

    NARCIS (Netherlands)

    Roorda, Dirk

    2016-01-01

    Text-Fabric is a Python3 package for Text plus Annotations. It provides a data model, a text file format, and a binary format for (ancient) text plus (linguistic) annotations. The emphasis of this all is on: data processing; sharing data; and contributing modules. A defining characteristic is that

  7. Contextual Text Mining

    Science.gov (United States)

    Mei, Qiaozhu

    2009-01-01

    With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the…

  8. XML and Free Text.

    Science.gov (United States)

    Riggs, Ken Roger

    2002-01-01

    Discusses problems with marking free text, text that is either natural language or semigrammatical but unstructured, that prevent well-formed XML from marking text for readily available meaning. Proposes a solution to mark meaning in free text that is consistent with the intended simplicity of XML versus SGML. (Author/LRW)

  9. Text processing for technical reports (direct computer-assisted origination, editing, and output of text)

    Energy Technology Data Exchange (ETDEWEB)

    De Volpi, A.; Fenrick, M. R.; Stanford, G. S.; Fink, C. L.; Rhodes, E. A.

    1980-10-01

    Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value to professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.

  10. E-text

    DEFF Research Database (Denmark)

    Finnemann, Niels Ole

    2018-01-01

    text can be defined by taking as point of departure the digital format in which everything is represented in the binary alphabet. While the notion of text, in most cases, lends itself to be independent of medium and embodiment, it is also often tacitly assumed that it is, in fact, modeled around...... the print medium, rather than written text or speech. In late 20th century, the notion of text was subject to increasing criticism as in the question raised within literary text theory: is there a text in this class? At the same time, the notion was expanded by including extra linguistic sign modalities...

  11. English Metafunction Analysis in Chemistry Text: Characterization of Scientific Text

    Directory of Open Access Journals (Sweden)

    Ahmad Amin Dalimunte, M.Hum

    2013-09-01

    Full Text Available The objectives of this research are to identify what Metafunctions are applied in chemistry text and how they characterize a scientific text. It was conducted by applying content analysis. The data for this research was a twelve-paragraph chemistry text. The data were collected by applying a documentary technique. The document was read and analyzed to find out the Metafunction. The data were analyzed by some procedures: identifying the types of process, counting up the number of the processes, categorizing and counting up the cohesion devices, classifying the types of modulation and determining modality value, finally counting up the number of sentences and clauses, then scoring the grammatical intricacy index. The findings of the research show that Material process (71of 100 is mostly used, circumstance of spatial location (26 of 56 is more dominant than the others. Modality (5 is less used in order to avoid from subjectivity. Impersonality is implied through less use of reference either pronouns (7 or demonstrative (7, conjunctions (60 are applied to develop ideas, and the total number of the clauses are found much more dominant (109 than the total number of the sentences (40 which results high grammatical intricacy index. The Metafunction found indicate that the chemistry text has fulfilled the characteristics of scientific or academic text which truly reflects it as a natural science.

  12. Academic Journal Embargoes and Full Text Databases.

    Science.gov (United States)

    Brooks, Sam

    2003-01-01

    Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…

  13. Registration document 2005; Document de reference 2005

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2005-07-01

    This reference document of Gaz de France provides information and data on the Group activities in 2005: financial informations, business, activities, equipments factories and real estate, trade, capital, organization charts, employment, contracts and research programs. (A.L.B.)

  14. Semantic Annotation of Unstructured Documents Using Concepts Similarity

    Directory of Open Access Journals (Sweden)

    Fernando Pech

    2017-01-01

    Full Text Available There is a large amount of information in the form of unstructured documents which pose challenges in the information storage, search, and retrieval. This situation has given rise to several information search approaches. Some proposals take into account the contextual meaning of the terms specified in the query. Semantic annotation technique can help to retrieve and extract information in unstructured documents. We propose a semantic annotation strategy for unstructured documents as part of a semantic search engine. In this proposal, ontologies are used to determine the context of the entities specified in the query. Our strategy for extracting the context is focused on concepts similarity. Each relevant term of the document is associated with an instance in the ontology. The similarity between each of the explicit relationships is measured through the combination of two types of associations: the association between each pair of concepts and the calculation of the weight of the relationships.

  15. 2002 reference document; Document de reference 2002

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2002-07-01

    This 2002 reference document of the group Areva, provides information on the society. Organized in seven chapters, it presents the persons responsible for the reference document and for auditing the financial statements, information pertaining to the transaction, general information on the company and share capital, information on company operation, changes and future prospects, assets, financial position, financial performance, information on company management and executive board and supervisory board, recent developments and future prospects. (A.L.B.)

  16. SparkText: Biomedical Text Mining on Big Data Framework.

    Directory of Open Access Journals (Sweden)

    Zhan Ye

    Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

  17. Audit of Orthopaedic Surgical Documentation

    Directory of Open Access Journals (Sweden)

    Fionn Coughlan

    2015-01-01

    Full Text Available Introduction. The Royal College of Surgeons in England published guidelines in 2008 outlining the information that should be documented at each surgery. St. James’s Hospital uses a standard operation sheet for all surgical procedures and these were examined to assess documentation standards. Objectives. To retrospectively audit the hand written orthopaedic operative notes according to established guidelines. Methods. A total of 63 operation notes over seven months were audited in terms of date and time of surgery, surgeon, procedure, elective or emergency indication, operative diagnosis, incision details, signature, closure details, tourniquet time, postop instructions, complications, prosthesis, and serial numbers. Results. A consultant performed 71.4% of procedures; however, 85.7% of the operative notes were written by the registrar. The date and time of surgery, name of surgeon, procedure name, and signature were documented in all cases. The operative diagnosis and postoperative instructions were frequently not documented in the designated location. Incision details were included in 81.7% and prosthesis details in only 30% while the tourniquet time was not documented in any. Conclusion. Completion and documentation of operative procedures were excellent in some areas; improvement is needed in documenting tourniquet time, prosthesis and incision details, and the location of operative diagnosis and postoperative instructions.

  18. Texting on the Move

    Science.gov (United States)

    ... text. What's the Big Deal? The problem is multitasking. No matter how young and agile we are, ... on something other than the road. In fact, driving while texting (DWT) can be more dangerous than ...

  19. Text Coherence in Translation

    Science.gov (United States)

    Zheng, Yanping

    2009-01-01

    In the thesis a coherent text is defined as a continuity of senses of the outcome of combining concepts and relations into a network composed of knowledge space centered around main topics. And the author maintains that in order to obtain the coherence of a target language text from a source text during the process of translation, a translator can…

  20. Compression of Probabilistic XML documents

    NARCIS (Netherlands)

    Veldman, Irma

    2009-01-01

    Probabilistic XML (PXML) files resulting from data integration can become extremely large, which is undesired. For XML there are several techniques available to compress the document and since probabilistic XML is in fact (a special form of) XML, it might benefit from these methods even more. In

  1. SparkText: Biomedical Text Mining on Big Data Framework

    Science.gov (United States)

    He, Karen Y.; Wang, Kai

    2016-01-01

    Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652

  2. SparkText: Biomedical Text Mining on Big Data Framework.

    Science.gov (United States)

    Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

    Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.

  3. Web document engineering

    International Nuclear Information System (INIS)

    White, B.

    1996-05-01

    This tutorial provides an overview of several document engineering techniques which are applicable to the authoring of World Wide Web documents. It illustrates how pre-WWW hypertext research is applicable to the development of WWW information resources

  4. Enterprise Document Management

    Data.gov (United States)

    US Agency for International Development — The function of the operation is to provide e-Signature and document management support for Acquisition and Assisitance (A&A) documents including vouchers in...

  5. WIPP documentation plan

    International Nuclear Information System (INIS)

    Plung, D.L.; Montgomery, T.T.; Glasstetter, S.R.

    1986-01-01

    In support of the programs at the Waste Isolation Pilot Plant (WIPP), the Publications and Procedures Section developed a documentation plan that provides an integrated document hierarchy; further, this plan affords several unique features: 1) the format for procedures minimizes the writing responsibilities of the technical staff and maximizes use of the writing and editing staff; 2) review cycles have been structured to expedite the processing of documents; and 3) the numbers of documents needed to support the program have been appreciably reduced

  6. Documenting Employee Conduct

    Science.gov (United States)

    Dalton, Jason

    2009-01-01

    One of the best ways for a child care program to lose an employment-related lawsuit is failure to document the performance of its employees. Documentation of an employee's performance can provide evidence of an employment-related decision such as discipline, promotion, or discharge. When properly implemented, documentation of employee performance…

  7. Documents preparation and review

    International Nuclear Information System (INIS)

    1999-01-01

    Ignalina Safety Analysis Group takes active role in assisting regulatory body VATESI to prepare various regulatory documents and reviewing safety reports and other documentation presented by Ignalina NPP in the process of licensing of unit 1. The list of main documents prepared and reviewed is presented

  8. Dictionaries for text production

    DEFF Research Database (Denmark)

    Fuertes-Olivera, Pedro; Bergenholtz, Henning

    2018-01-01

    Dictionaries for Text Production are information tools that are designed and constructed for helping users to produce (i.e. encode) texts, both oral and written texts. These can be broadly divided into two groups: (a) specialized text production dictionaries, i.e., dictionaries that only offer...... a small amount of lexicographic data, most or all of which are typically used in a production situation, e.g. synonym dictionaries, grammar and spelling dictionaries, collocation dictionaries, concept dictionaries such as the Longman Language Activator, which is advertised as the World’s First Production...... Dictionary; (b) general text production dictionaries, i.e., dictionaries that offer all or most of the lexicographic data that are typically used in a production situation. A review of existing production dictionaries reveals that there are many specialized text production dictionaries but only a few general...

  9. Multilingual access to full text databases

    International Nuclear Information System (INIS)

    Fluhr, C.; Radwan, K.

    1990-05-01

    Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs

  10. The socio-demographics of texting

    DEFF Research Database (Denmark)

    Ling, Richard; Bertel, Troels Fibæk; Sundsøy, Pål

    2012-01-01

    Who texts, and with whom do they text? This article examines the use of texting using metered traffic data from a large dataset (nearly 400 million anonymous text messages). We ask 1) How much do different age groups use mobile phone based texting (SMS)? 2) How wide is the circle of texting...

  11. Instant Sublime Text starter

    CERN Document Server

    Haughee, Eric

    2013-01-01

    A starter which teaches the basic tasks to be performed with Sublime Text with the necessary practical examples and screenshots. This book requires only basic knowledge of the Internet and basic familiarity with any one of the three major operating systems, Windows, Linux, or Mac OS X. However, as Sublime Text 2 is primarily a text editor for writing software, many of the topics discussed will be specifically relevant to software development. That being said, the Sublime Text 2 Starter is also suitable for someone without a programming background who may be looking to learn one of the tools of

  12. Biomarker Identification Using Text Mining

    Directory of Open Access Journals (Sweden)

    Hui Li

    2012-01-01

    Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.

  13. Born Broken: Fonts and Information Loss in Legacy Digital Documents

    Directory of Open Access Journals (Sweden)

    Geoffrey Brown

    2011-03-01

    Full Text Available For millions of legacy documents, correct rendering depends upon resources such as fonts that are not generally embedded within the document structure. Yet there is a significant risk of information loss due to missing or incorrectly substituted fonts. Large document collections depend on thousands of unique fonts not available on a common desktop workstation, which typically has between 100 and 200 fonts. Silent substitution of fonts, performed by applications such as Microsoft Office, can yield poorly rendered documents. In this paper we use a collection of 230,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.

  14. Social Media Text Classification by Enhancing Well-Formed Text Trained Model

    Directory of Open Access Journals (Sweden)

    Phat Jotikabukkana

    2016-09-01

    Full Text Available Social media are a powerful communication tool in our era of digital information. The large amount of user-generated data is a useful novel source of data, even though it is not easy to extract the treasures from this vast and noisy trove. Since classification is an important part of text mining, many techniques have been proposed to classify this kind of information. We developed an effective technique of social media text classification by semi-supervised learning utilizing an online news source consisting of well-formed text. The computer first automatically extracts news categories, well-categorized by publishers, as classes for topic classification. A bag of words taken from news articles provides the initial keywords related to their category in the form of word vectors. The principal task is to retrieve a set of new productive keywords. Term Frequency-Inverse Document Frequency weighting (TF-IDF and Word Article Matrix (WAM are used as main methods. A modification of WAM is recomputed until it becomes the most effective model for social media text classification. The key success factor was enhancing our model with effective keywords from social media. A promising result of 99.50% accuracy was achieved, with more than 98.5% of Precision, Recall, and F-measure after updating the model three times.

  15. Starlink Document Styles

    Science.gov (United States)

    Lawden, M. D.

    This document describes the various styles which are recommended for Starlink documents. It also explains how to use the templates which are provided by Starlink to help authors create documents in a standard style. This paper is concerned mainly with conveying the ``look and feel" of the various styles of Starlink document rather than describing the technical details of how to produce them. Other Starlink papers give recommendations for the detailed aspects of document production, design, layout, and typography. The only style that is likely to be used by most Starlink authors is the Standard style.

  16. Subject (of documents)

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2017-01-01

    This article presents and discuss the concept “subject” or subject matter (of documents) as it has been examined in library and information science (LIS) for more than 100 years. Different theoretical positions are outlined and it is found that the most important distinction is between document......-oriented views versus request-oriented views. The document-oriented view conceive subject as something inherent in documents, whereas the request-oriented view (or the policy based view) understand subject as an attribution made to documents in order to facilitate certain uses of them. Related concepts...

  17. Multilingual text induced spelling correction

    NARCIS (Netherlands)

    Reynaert, M.W.C.

    2004-01-01

    We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams

  18. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  19. Linguistics in Text Interpretation

    DEFF Research Database (Denmark)

    Togeby, Ole

    2011-01-01

    A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'.......A model for how text interpretation proceeds from what is pronounced, through what is said to what is comunicated, and definition of the concepts 'presupposition' and 'implicature'....

  20. LocText

    DEFF Research Database (Denmark)

    Cejuela, Juan Miguel; Vinchurkar, Shrikant; Goldberg, Tatyana

    2018-01-01

    trees and was trained and evaluated on a newly improved LocTextCorpus. Combined with an automatic named-entity recognizer, LocText achieved high precision (P = 86%±4). After completing development, we mined the latest research publications for three organisms: human (Homo sapiens), budding yeast...

  1. Systematic text condensation

    DEFF Research Database (Denmark)

    Malterud, Kirsti

    2012-01-01

    To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies.......To present background, principles, and procedures for a strategy for qualitative analysis called systematic text condensation and discuss this approach compared with related strategies....

  2. The Perfect Text.

    Science.gov (United States)

    Russo, Ruth

    1998-01-01

    A chemistry teacher describes the elements of the ideal chemistry textbook. The perfect text is focused and helps students draw a coherent whole out of the myriad fragments of information and interpretation. The text would show chemistry as the central science necessary for understanding other sciences and would also root chemistry firmly in the…

  3. Text 2 Mind Map

    OpenAIRE

    Iona, John

    2017-01-01

    This is a review of the web resource 'Text 2 Mind Map' www.Text2MindMap.com. It covers what the resource is, and how it might be used in Library and education context, in particular for School Librarians.

  4. Text File Comparator

    Science.gov (United States)

    Kotler, R. S.

    1983-01-01

    File Comparator program IFCOMP, is text file comparator for IBM OS/VScompatable systems. IFCOMP accepts as input two text files and produces listing of differences in pseudo-update form. IFCOMP is very useful in monitoring changes made to software at the source code level.

  5. Linguistic Dating of Biblical Texts

    DEFF Research Database (Denmark)

    Ehrensvärd, Martin Gustaf

    2003-01-01

    For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed the chronol......For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed...... the chronology of the texts established by other means: the Hebrew of Genesis-2 Kings was judged to be early and that of Esther, Daniel, Ezra, Nehemiah, and Chronicles to be late. In the current debate where revisionists have questioned the traditional dating, linguistic arguments in the dating of texts have...... come more into focus. The study critically examines some linguistic arguments adduced to support the traditional position, and reviewing the arguments it points to weaknesses in the linguistic dating of EBH texts to pre-exilic times. When viewing the linguistic evidence in isolation it will be clear...

  6. Zum Bildungspotenzial biblischer Texte

    Directory of Open Access Journals (Sweden)

    Theis, Joachim

    2017-11-01

    Full Text Available Biblical education as a holistic process goes far beyond biblical learning. It must be understood as a lifelong process, in which both biblical texts and their understanders operate appropriating their counterpart in a dialogical way. – Neither does the recipient’s horizon of understanding appear as an empty room, which had to be filled with the text only, nor is the latter a dead material one could only examine cognitively. The recipient discovers the meaning of the biblical text recomposing it by existential appropriation. So the text is brought to live in each individual reality. Both scientific insights and subjective structures as well as the understanders’ community must be included to avoid potential one-sidednesses. Unfortunately, a special negative association obscures the approach of the bible very often: Still biblical work as part of religious education appears in a cognitively oriented habit, which is neither regarding the vitality and sovereignty of the biblical texts nor the students’ desire for meaning. Moreover, the bible is getting misused for teaching moral terms or pontifications. Such downfalls can be disrupted by biblical didactics which are empowerment didactics. Regarding the sovereignty of biblical texts, these didactics assist the understander with his/her individuation by opening the texts with focus on the understander’s otherness. Thus each the text and the recipient become subjects in a dialogue. The approach of the Biblical-Enabling-Didactics leads the Bible to become always new a book of life. Understanding them from within their hermeneutics, empowerment didactics could be raised to the principle of biblical didactics in general and grow into an essential element of holistic education.

  7. EST: Evading Scientific Text.

    Science.gov (United States)

    Ward, Jeremy

    2001-01-01

    Examines chemical engineering students' attitudes to text and other parts of English language textbooks. A questionnaire was administered to a group of undergraduates. Results reveal one way students get around the problem of textbook reading. (Author/VWL)

  8. nal Sesotho texts

    African Journals Online (AJOL)

    with literary texts written in indigenous South African languages. The project ... Homi Bhabha uses the words of Salman Rushdie to underline the fact that new .... I could not conceptualise an African-language-to-African-language dictionary. An.

  9. Plagiarism in Academic Texts

    Directory of Open Access Journals (Sweden)

    Marta Eugenia Rojas-Porras

    2012-08-01

    Full Text Available The ethical and social responsibility of citing the sources in a scientific or artistic work is undeniable. This paper explores, in a preliminary way, academic plagiarism in its various forms. It includes findings based on a forensic analysis. The purpose of this paper is to raise awareness on the importance of considering these details when writing and publishing a text. Hopefully, this analysis may put the issue under discussion.

  10. Machine Translation from Text

    Science.gov (United States)

    Habash, Nizar; Olive, Joseph; Christianson, Caitlin; McCary, John

    Machine translation (MT) from text, the topic of this chapter, is perhaps the heart of the GALE project. Beyond being a well defined application that stands on its own, MT from text is the link between the automatic speech recognition component and the distillation component. The focus of MT in GALE is on translating from Arabic or Chinese to English. The three languages represent a wide range of linguistic diversity and make the GALE MT task rather challenging and exciting.

  11. Electronic Document Management Using Inverted Files System

    Directory of Open Access Journals (Sweden)

    Suhartono Derwin

    2014-03-01

    Full Text Available The amount of documents increases so fast. Those documents exist not only in a paper based but also in an electronic based. It can be seen from the data sample taken by the SpringerLink publisher in 2010, which showed an increase in the number of digital document collections from 2003 to mid of 2010. Then, how to manage them well becomes an important need. This paper describes a new method in managing documents called as inverted files system. Related with the electronic based document, the inverted files system will closely used in term of its usage to document so that it can be searched over the Internet using the Search Engine. It can improve document search mechanism and document save mechanism.

  12. Document organization by means of graphs

    Directory of Open Access Journals (Sweden)

    Santa Vallejo Figueroa

    2016-12-01

    Full Text Available Nowadays documents are the main way to represent information and knowledge in several domains. Continuously users store documents in hard disk or online media according to some personal organization based on topics, but such documents can contain one or more topics. This situation makes hard to access documents when is required. The current search engines are based on the name of file or content, but where the desired term or terms must match exactly as are in the content. In this paper, a method for organize documents by means of graphs is proposed, taking into account the topics of the documents. For this a graph for each document is generated taking into account synonyms, semantic related terms, hyponyms, and hypernyms of nouns and verbs contained in documents. The proposal have been compares against Google Desktop and LogicalDoc with interesting results.

  13. Gaia DR1 documentation

    Science.gov (United States)

    van Leeuwen, F.; de Bruijne, J. H. J.; Arenou, F.; Comoretto, G.; Eyer, L.; Farras Casas, M.; Hambly, N.; Hobbs, D.; Salgado, J.; Utrilla Molina, E.; Vogt, S.; van Leeuwen, M.; Abreu, A.; Altmann, M.; Andrei, A.; Babusiaux, C.; Bastian, U.; Biermann, M.; Blanco-Cuaresma, S.; Bombrun, A.; Borrachero, R.; Brown, A. G. A.; Busonero, D.; Busso, G.; Butkevich, A.; Cantat-Gaudin, T.; Carrasco, J. M.; Castañeda, J.; Charnas, J.; Cheek, N.; Clementini, G.; Crowley, C.; Cuypers, J.; Davidson, M.; De Angeli, F.; De Ridder, J.; Evans, D.; Fabricius, C.; Findeisen, K.; Fleitas, J. M.; Gracia, G.; Guerra, R.; Guy, L.; Helmi, A.; Hernandez, J.; Holl, B.; Hutton, A.; Klioner, S.; Lammers, U.; Lecoeur-Taïbi, I.; Lindegren, L.; Luri, X.; Marinoni, S.; Marrese, P.; Messineo, R.; Michalik, D.; Mignard, F.; Montegriffo, P.; Mora, A.; Mowlavi, N.; Nienartowicz, K.; Pancino, E.; Panem, C.; Portell, J.; Rimoldini, L.; Riva, A.; Robin, A.; Siddiqui, H.; Smart, R.; Sordo, R.; Soria, S.; Turon, C.; Vallenari, A.; Voss, H.

    2017-12-01

    94000 Hipparcos stars in the primary data set, the proper motion standard errors are much smaller, at about 0.06 mas yr^-1. For the secondary astrometric data set, the typical standard error on the positions is 10 mas. The median standard errors on the mean G-band magnitudes range from the milli-magnitude level to 0.03 mag over the magnitude range 5 to 20.7. The DPAC undertook an extensive validation of Gaia DR1 which confirmed that this data release represents a major advance in the mapping of the skies and the availability of basic stellar data that form the foundation of observational astrophysics. However, as a consequence of the very preliminary nature of this first Gaia data release, there are a number of important limitations to the data quality. These limitations are documented in the Astronomy & Astrophysics papers that accompany Gaia DR1, with further information provided in this documentation. The reader is strongly encouraged to read about these limitations and to carefully consider them before drawing conclusions from the data. This Gaia DR1 documentation complements the peer-reviewed papers that accompany the release in a Special Issue of Astronomy & Astrophysics. The papers form the primary documentation for the data release and they are frequently referenced throughout the text.

  14. An Intelligent System For Arabic Text Categorization

    NARCIS (Netherlands)

    Syiam, M.M.; Tolba, Mohamed F.; Fayed, Z.T.; Abdel-Wahab, Mohamed S.; Ghoniemy, Said A.; Habib, Mena Badieh

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and

  15. Text mining for the biocuration workflow.

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  16. Real Time Text Analysis

    Science.gov (United States)

    Senthilkumar, K.; Ruchika Mehra Vijayan, E.

    2017-11-01

    This paper aims to illustrate real time analysis of large scale data. For practical implementation we are performing sentiment analysis on live Twitter feeds for each individual tweet. To analyze sentiments we will train our data model on sentiWordNet, a polarity assigned wordNet sample by Princeton University. Our main objective will be to efficiency analyze large scale data on the fly using distributed computation. Apache Spark and Apache Hadoop eco system is used as distributed computation platform with Java as development language

  17. Enhancing biomedical text summarization using semantic relation extraction.

    Directory of Open Access Journals (Sweden)

    Yue Shang

    Full Text Available Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1 We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2 We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3 For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  18. Using ontology network structure in text mining.

    Science.gov (United States)

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  19. Scheme Program Documentation Tools

    DEFF Research Database (Denmark)

    Nørmark, Kurt

    2004-01-01

    are separate and intended for different documentation purposes they are related to each other in several ways. Both tools are based on XML languages for tool setup and for documentation authoring. In addition, both tools rely on the LAML framework which---in a systematic way---makes an XML language available...... as named functions in Scheme. Finally, the Scheme Elucidator is able to integrate SchemeDoc resources as part of an internal documentation resource....

  20. Health physics documentation

    International Nuclear Information System (INIS)

    Stablein, G.

    1980-01-01

    When dealing with radioactive material the health physicist receives innumerable papers and documents within the fields of researching, prosecuting, organizing and justifying radiation protection. Some of these papers are requested by the health physicist and some are required by law. The scope, quantity and deposit periods of the health physics documentation at the Karlsruhe Nuclear Research Center are presented and rationalizing methods discussed. The aim of this documentation should be the application of physics to accident prevention, i.e. documentation should protect those concerned and not the health physicist. (H.K.)

  1. CAED Document Repository

    Data.gov (United States)

    U.S. Environmental Protection Agency — Compliance Assurance and Enforcement Division Document Repository (CAEDDOCRESP) provides internal and external access of Inspection Records, Enforcement Actions, and...

  2. CFO Payment Document Management

    Data.gov (United States)

    US Agency for International Development — Paperless management will enable the CFO to create, store, and access various financial documents electronically. This capability will reduce time looking for...

  3. TEXT Energy Storage System

    International Nuclear Information System (INIS)

    Weldon, W.F.; Rylander, H.G.; Woodson, H.H.

    1977-01-01

    The Texas Experimental Tokamak (TEXT) Enery Storage System, designed by the Center for Electromechanics (CEM), consists of four 50 MJ, 125 V homopolar generators and their auxiliaries and is designed to power the toroidal and poloidal field coils of TEXT on a two-minute duty cycle. The four 50 MJ generators connected in series were chosen because they represent the minimum cost configuration and also represent a minimal scale up from the successful 5.0 MJ homopolar generator designed, built, and operated by the CEM

  4. The Emar Lexical Texts

    NARCIS (Netherlands)

    Gantzert, Merijn

    2011-01-01

    This four-part work provides a philological analysis and a theoretical interpretation of the cuneiform lexical texts found in the Late Bronze Age city of Emar, in present-day Syria. These word and sign lists, commonly dated to around 1100 BC, were almost all found in the archive of a single school.

  5. Texts and Readers.

    Science.gov (United States)

    Iser, Wolfgang

    1980-01-01

    Notes that, since fictional discourse need not reflect prevailing systems of meaning and norms or values, readers gain detachment from their own presuppositions; by constituting and formulating text-sense, readers are constituting and formulating their own cognition and becoming aware of the operations for doing so. (FL)

  6. Modeling Documents with Event Model

    Directory of Open Access Journals (Sweden)

    Longhui Wang

    2015-08-01

    Full Text Available Currently deep learning has made great breakthroughs in visual and speech processing, mainly because it draws lessons from the hierarchical mode that brain deals with images and speech. In the field of NLP, a topic model is one of the important ways for modeling documents. Topic models are built on a generative model that clearly does not match the way humans write. In this paper, we propose Event Model, which is unsupervised and based on the language processing mechanism of neurolinguistics, to model documents. In Event Model, documents are descriptions of concrete or abstract events seen, heard, or sensed by people and words are objects in the events. Event Model has two stages: word learning and dimensionality reduction. Word learning is to learn semantics of words based on deep learning. Dimensionality reduction is the process that representing a document as a low dimensional vector by a linear mode that is completely different from topic models. Event Model achieves state-of-the-art results on document retrieval tasks.

  7. Transfer Learning beyond Text Classification

    Science.gov (United States)

    Yang, Qiang

    Transfer learning is a new machine learning and data mining framework that allows the training and test data to come from different distributions or feature spaces. We can find many novel applications of machine learning and data mining where transfer learning is necessary. While much has been done in transfer learning in text classification and reinforcement learning, there has been a lack of documented success stories of novel applications of transfer learning in other areas. In this invited article, I will argue that transfer learning is in fact quite ubiquitous in many real world applications. In this article, I will illustrate this point through an overview of a broad spectrum of applications of transfer learning that range from collaborative filtering to sensor based location estimation and logical action model learning for AI planning. I will also discuss some potential future directions of transfer learning.

  8. IDC System Specification Document.

    Energy Technology Data Exchange (ETDEWEB)

    Clifford, David J.

    2014-12-01

    This document contains the system specifications derived to satisfy the system requirements found in the IDC System Requirements Document for the IDC Reengineering Phase 2 project. Revisions Version Date Author/Team Revision Description Authorized by V1.0 12/2014 IDC Reengineering Project Team Initial delivery M. Harris

  9. INFCE plenary conference documents

    International Nuclear Information System (INIS)

    This document consists of the reports to the First INFCE Plenary Conference (November 1978) by the Working Groups a Plenary Conference of its actions and decisions, the Communique of the Final INFCE Plenary Conference (February 1980), and a list of all documents in the IAEA depository for INFCE

  10. Human Document Project

    NARCIS (Netherlands)

    de Vries, Jeroen; Abelmann, Leon; Manz, A; Elwenspoek, Michael Curt

    2012-01-01

    “The Human Document Project‿ is a project which tries to answer all of the questions related to preserving information about the human race for tens of generations of humans to come or maybe even for a future intelligence which can emerge in the coming thousands of years. This document mainly

  11. Reactive documentation system

    Science.gov (United States)

    Boehnlein, Thomas R.; Kramb, Victoria

    2018-04-01

    Proper formal documentation of computer acquired NDE experimental data generated during research is critical to the longevity and usefulness of the data. Without documentation describing how and why the data was acquired, NDE research teams lose capability such as their ability to generate new information from previously collected data or provide adequate information so that their work can be replicated by others seeking to validate their research. Despite the critical nature of this issue, NDE data is still being generated in research labs without appropriate documentation. By generating documentation in series with data, equal priority is given to both activities during the research process. One way to achieve this is to use a reactive documentation system (RDS). RDS prompts an operator to document the data as it is generated rather than relying on the operator to decide when and what to document. This paper discusses how such a system can be implemented in a dynamic environment made up of in-house and third party NDE data acquisition systems without creating additional burden on the operator. The reactive documentation approach presented here is agnostic enough that the principles can be applied to any operator controlled, computer based, data acquisition system.

  12. The TEXT upgrade vertical interferometer

    International Nuclear Information System (INIS)

    Hallock, G.A.; Gartman, M.L.; Li, W.; Chiang, K.; Shin, S.; Castles, R.L.; Chatterjee, R.; Rahman, A.S.

    1992-01-01

    A far-infrared interferometer has been installed on TEXT upgrade to obtain electron density profiles. The primary system views the plasma vertically through a set of large (60-cm radialx7.62-cm toroidal) diagnostic ports. A 1-cm channel spacing (59 channels total) and fast electronic time response is used, to provide high resolution for radial profiles and perturbation experiments. Initial operation of the vertical system was obtained late in 1991, with six operating channels

  13. Documentation: Records and Reports.

    Science.gov (United States)

    Akers, Michael J

    2017-01-01

    This article deals with documentation to include the beginning of documentation, the requirements of Good Manufacturing Practice reports and records, and the steps that can be taken to minimize Good Manufacturing Practice documentation problems. It is important to remember that documentation for 503a compounding involves the Formulation Record, Compounding Record, Standard Operating Procedures, Safety Data Sheets, etc. For 503b outsourcing facilities, compliance with Current Good Manufacturing Practices is required, so this article is applicable to them. For 503a pharmacies, one can see the development and modification of Good Manufacturing Practice and even observe changes as they are occurring in 503a documentation requirements and anticipate that changes will probably continue to occur. Copyright© by International Journal of Pharmaceutical Compounding, Inc.

  14. Text Skimming: The Process and Effectiveness of Foraging through Text under Time Pressure

    Science.gov (United States)

    Duggan, Geoffrey B.; Payne, Stephen J.

    2009-01-01

    Is Skim reading effective? How do readers allocate their attention selectively? The authors report 3 experiments that use expository texts and allow readers only enough time to read half of each document. Experiment 1 found that, relative to reading half the text, skimming improved memory for important ideas from a text but did not improve memory…

  15. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  16. Document Categorization with Modified Statistical Language Models for Agglutinative Languages

    Directory of Open Access Journals (Sweden)

    Tantug

    2010-11-01

    Full Text Available In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serious data sparseness problems. In order to cope with this drawback, previous studies in various application areas suggest modified language models based on different morphological units. It is reported that performance improvements can be achieved with these modified language models. In our document categorization experiments, we use standard word form based language models as well as other modified language models based on root words, root words and part-of-speech information, truncated word forms and character sequences. Additionally, to find an optimum parameter set, multiple tests are carried out with different language model orders and smoothing methods. Similar to previous studies on other tasks, our experimental results on categorization of Turkish documents reveal that applying linguistic preprocessing steps for language modeling provides improvements over standard language models to some extent. However, it is also observed that similar level of performance improvements can also be acquired by simpler character level or truncated word form models which are language independent.

  17. Enhancing biomedical text summarization using semantic relation extraction.

    Science.gov (United States)

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  18. Growing electronic documents created by researchers

    Directory of Open Access Journals (Sweden)

    Monika Weiss

    2017-05-01

    Full Text Available In the contemporary world technology is an indispensable element, both in personal and professional sphere. Despite the fact, that we do not attach significance to it in our everyday lives, the technological development engulfed us and still reminds us about that. In the face of dynamically growing digitization there occurred a new form of document – an electronic document. The study concerns the growing electronic documentation among researchers working at the Nicolaus Copernicus University in Toruń. The analysis of surveys and interviews resulted in thesis, that researchers use e-document more frequently than analog documentation. Flexibility and accessibility of this type of documents become a problem in personal papers which will be archived in the future – maybe in most part in the form of electronic documentation.

  19. Reading Authentic Texts

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2013-01-01

    Most research on cognates has focused on words presented in isolation that are easily defined as cognate between L1 and L2. In contrast, this study investigates what counts as cognate in authentic texts and how such cognates are read. Participants with L1 Danish read news articles in their highly...... proficient L2, English, while their eye-movements were monitored. The experiment shows a cognate advantage for morphologically simple words, but only when cognateness is defined relative to translation equivalents that are appropriate in the context. For morphologically complex words, a cognate disadvantage...... word predictability indexed by the conditional probability of each word....

  20. Journalistic Text Production

    DEFF Research Database (Denmark)

    Haugaard, Rikke Hartmann

    , a multiple case study investigated three professional text producers’ practices as they unfolded in their natural setting at the Spanish newspaper, El Mundo. • Results indicate that journalists’ revisions are related to form markedly more often than to content. • Results suggest two writing phases serving...... at the Spanish newspaper, El Mundo, in Madrid. The study applied a combination of quantitative and qualitative methods, i.e. keystroke logging, participant observation and retrospective interview. Results indicate that journalists’ revisions are related to form markedly more often than to content (approx. three...

  1. Texts of presentation

    Energy Technology Data Exchange (ETDEWEB)

    Magnin, G.; Vidolov, K.; Dufour-Fallot, B.; Dewarrat, Th.; Rose, T.; Favatier, A.; Gazeley, D.; Pujol, T.; Worner, D.; Van de Wel, E.; Revaz, J.M.; Clerfayt, G.; Creedy, A.; Moisan, F.; Geissler, M.; Isbell, P.; Macaluso, M.; Litzka, V.; Gillis, W.; Jarvis, I.; Gorg, M.; Bebie, B.

    2004-07-01

    Implementing a sustainable local energy policy involves a long term reflection on the general interest, energy efficiency, distributed generation and environmental protection. Providing services on a market involves looking for activities that are profitable, if possible in the 'short-term'. The aim of this conference is to analyse the possibility of reconciling these apparently contradictory requirements and how this can be achieved. This conference brings together the best specialists from European municipalities as well as important partners for local authorities (energy agencies, service companies, institutions, etc.) in order to discuss the public-private partnerships concerning the various functions that municipalities may perform in the energy field as consumers and customers, planners and organizers of urban space and rousers as regards inhabitants and economic players of their areas. This document contains the summaries of the following presentations: 1 - Performance contracting: Bulgarian municipalities use private capital for energy efficiency improvement (K. VIDOLOV, Varna (BG)), Contracting experiences in Swiss municipalities: consistent energy policy thanks to the Energy-city label (B. DUFOUR-FALLOT and T. DEWARRAT (CH)), Experience of contracting in the domestic sector (T. ROSE (GB)); 2 - Public procurement: Multicolor electricity (A. FAVATIER (CH)), Tendering for new green electricity capacity (D. GAZELEY (GB)), The Barcelona solar thermal ordinance (T. PUJOL (ES)); 3 - Urban planning and schemes: Influencing energy issues through urban planning (D. WOERNER (DE)), Tendering for the supply of energy infrastructure (E. VAN DE WEL (NL)), Concessions and public utility warranty (J.M. REVAZ (CH)); 4 - Certificate schemes: the market of green certificates in Wallonia region in a liberalized power market (G. CLERFAYT (BE)), The Carbon Neutral{sup R} project: a voluntary certification scheme with opportunity for implementation in other European

  2. Weitere Texte physiognomischen Inhalts

    Directory of Open Access Journals (Sweden)

    Böck, Barbara

    2004-12-01

    Full Text Available The present article offers the edition of three cuneiform texts belonging to the Akkadian handbook of omens drawn from the physical appearance as well as the morals and behaviour of man. The book comprising up to 27 chapters with more than 100 omens each was entitled in antiquity Alamdimmû. The edition of the three cuneiform tablets completes, thus, the author's monographic study on the ancient Mesopotamian divinatory discipline of physiognomy (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

    En este artículo se presenta la editio princeps de tres textos cuneiformes conservados en el British Museum (Londres y el Vorderasiatisches Museum (Berlín, que pertenecen al libro asirio-babilonio de presagios fisiognómicos. Este libro, titulado originalmente Alamdimmû ('forma, figura', consta de 27 capítulos, cada uno con más de cien presagios escritos en lengua acadia. Los tres textos completan así el estudio monográfico de la autora sobre la disciplina adivinatoria de la fisiognomía en el antiguo Oriente (Die babylonisch-assyrische Morphoskopie (Wien 2000 [=AfO Beih. 27].

  3. TRANSPORTATION SYSTEM REQUIREMENTS DOCUMENT

    International Nuclear Information System (INIS)

    2004-01-01

    This document establishes the Transportation system requirements for the U.S. Department of Energy's (DOE's) Civilian Radioactive Waste Management System (CRWMS). These requirements are derived from the Civilian Radioactive Waste Management System Requirements Document (CRD). The Transportation System Requirements Document (TSRD) was developed in accordance with LP-3.1Q-OCRWM, Preparation, Review, and Approval of Office of National Transportation Level-2 Baseline Requirements. As illustrated in Figure 1, the TSRD forms a part of the DOE Office of Civilian Radioactive Waste Management (OCRWM) Technical Baseline

  4. A Chinese text classification system based on Naive Bayes algorithm

    Directory of Open Access Journals (Sweden)

    Cui Wei

    2016-01-01

    Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

  5. Investigation into Text Classification With Kernel Based Schemes

    Science.gov (United States)

    2010-03-01

    Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the

  6. Areva - 2011 Reference document; Areva - Document de reference 2011

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2011-07-01

    After having indicated the person responsible of this document and the legal account auditors, and provided some financial information, this document gives an overview of the different risk factors existing in the company: law risks, industrial and environmental risks, operational risks, risks related to large projects, market and liquidity risks. Then, after having recalled the history and evolution of the company and the evolution of its investments over the last five years, it proposes an overview of Areva's activities on the markets of nuclear energy and renewable energies, of its clients and suppliers, of its strategy, of the activities of its different departments. Other information are provided: company's flow chart, estate properties (plants, equipment), an analysis of its financial situation, its research and development policy, the present context, profit previsions or estimations, management organization and operation

  7. Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw.

    Science.gov (United States)

    Görg, Carsten; Liu, Zhicheng; Kihm, Jaeyeon; Choo, Jaegul; Park, Haesun; Stasko, John

    2013-10-01

    Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: an academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.

  8. Transportation System Requirements Document

    International Nuclear Information System (INIS)

    1993-09-01

    This Transportation System Requirements Document (Trans-SRD) describes the functions to be performed by and the technical requirements for the Transportation System to transport spent nuclear fuel (SNF) and high-level radioactive waste (HLW) from Purchaser and Producer sites to a Civilian Radioactive Waste Management System (CRWMS) site, and between CRWMS sites. The purpose of this document is to define the system-level requirements for Transportation consistent with the CRWMS Requirement Document (CRD). These requirements include design and operations requirements to the extent they impact on the development of the physical segments of Transportation. The document also presents an overall description of Transportation, its functions, its segments, and the requirements allocated to the segments and the system-level interfaces with Transportation. The interface identification and description are published in the CRWMS Interface Specification

  9. Integrated Criteria Document Chromium

    NARCIS (Netherlands)

    Slooff W; Cleven RFMJ; Janus JA; van der Poel P; van Beelen P; Boumans LJM; Canton JH; Eerens HC; Krajnc EI; de Leeuw FAAM; Matthijsen AJCM; van de Meent D; van der Meulen A; Mohn GR; Wijland GC; de Bruijn PJ; van Keulen A; Verburgh JJ; van der Woerd KF

    1990-01-01

    Betreft de engelse versie van rapport 758701001
    Bij dit rapport behoort een appendix onder hetzelfde nummer getiteld: "Integrated Criteria Document Chromium: Effects" Auteurs: Janus JA; Krajnc EI
    (appendix: see 710401002A)

  10. NCDC Archive Documentation Manuals

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The National Climatic Data Center Tape Deck Documentation library is a collection of over 400 manuals describing NCDC's digital holdings (both historic and current)....

  11. Registration document 2005

    International Nuclear Information System (INIS)

    2005-01-01

    This reference document of Gaz de France provides information and data on the Group activities in 2005: financial informations, business, activities, equipments factories and real estate, trade, capital, organization charts, employment, contracts and research programs. (A.L.B.)

  12. IR and OLAP in XML document warehouses

    DEFF Research Database (Denmark)

    Perez, Juan Manuel; Pedersen, Torben Bach; Berlanga, Rafael

    2005-01-01

    In this paper we propose to combine IR and OLAP (On-Line Analytical Processing) technologies to exploit a warehouse of text-rich XML documents. In the system we plan to develop, a multidimensional implementation of a relevance modeling document model will be used for interactively querying...

  13. Multilingual access to full text databases; Acces multilingue aux bases de donnees en texte integral

    Energy Technology Data Exchange (ETDEWEB)

    Fluhr, C; Radwan, K [Institut National des Sciences et Techniques Nucleaires (INSTN), Centre d` Etudes de Saclay, 91 - Gif-sur-Yvette (France)

    1990-05-01

    Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs.

  14. 2002 reference document

    International Nuclear Information System (INIS)

    2002-01-01

    This 2002 reference document of the group Areva, provides information on the society. Organized in seven chapters, it presents the persons responsible for the reference document and for auditing the financial statements, information pertaining to the transaction, general information on the company and share capital, information on company operation, changes and future prospects, assets, financial position, financial performance, information on company management and executive board and supervisory board, recent developments and future prospects. (A.L.B.)

  15. Comprehending text in literature class

    Directory of Open Access Journals (Sweden)

    Purić Daliborka S.

    2016-01-01

    Full Text Available The paper discusses the problem of understanding a text and the contribution of methodological apparatus in the reader book to comprehension of a text being read in junior classes of elementary school. By using the technique of content analysis from methodological apparatuses in eight reader books for the fourth grade of elementary school, approved for usage in 2014/2015 academic year, and surveying 350 teachers in 33 elementary schools and 11 administrative districts in the Republic of Serbia we examined: (a to what extent the Serbian language text book contents enable junior students to understand a literary text; (b to what extent teachers accept the suggestions offered in the textbook for preparing literature teaching. The results show that a large number of suggestions relate to reading comprehension, but some of categories of understanding are unevenly distributed in the methodological apparatus. On the other hand, the majority of teachers use the methodological apparatus given in a textbook for preparing classes, not only the textbook he or she selected for teaching but also other textbooks for the same grade.

  16. LCS Content Document Application

    Science.gov (United States)

    Hochstadt, Jake

    2011-01-01

    My project at KSC during my spring 2011 internship was to develop a Ruby on Rails application to manage Content Documents..A Content Document is a collection of documents and information that describes what software is installed on a Launch Control System Computer. It's important for us to make sure the tools we use everyday are secure, up-to-date, and properly licensed. Previously, keeping track of the information was done by Excel and Word files between different personnel. The goal of the new application is to be able to manage and access the Content Documents through a single database backed web application. Our LCS team will benefit greatly with this app. Admin's will be able to login securely to keep track and update the software installed on each computer in a timely manner. We also included exportability such as attaching additional documents that can be downloaded from the web application. The finished application will ease the process of managing Content Documents while streamlining the procedure. Ruby on Rails is a very powerful programming language and I am grateful to have the opportunity to build this application.

  17. Documentation of spectrom-32

    International Nuclear Information System (INIS)

    Callahan, G.D.; Fossum, A.F.; Svalstad, D.K.

    1989-01-01

    SPECTROM-32 is a finite element program for analyzing two-dimensional and axisymmetric inelastic thermomechanical problems related to the geological disposal of nuclear waste. The code is part of the SPECTROM series of special-purpose computer programs that are being developed by RE/SPEC Inc. to address many unique rock mechanics problems encountered in analyzing radioactive wastes stored in geologic formations. This document presents the theoretical basis for the mathematical models, the finite element formulation and solution procedure of the program, a description of the input data for the program, verification problems, and details about program support and continuing documentation. The computer code documentation is intended to satisfy the requirements and guidelines outlined in the document entitled Final Technical Position on Documentation of Computer Codes for High-Level Waste Management. The principal component models used in the program involve thermoelastic, thermoviscoelastic, thermoelastic-plastic, and thermoviscoplastic types of material behavior. Special material considerations provide for the incorporation of limited-tension material behavior and consideration of jointed material behavior. Numerous program options provide the capabilities for various boundary conditions, sliding interfaces, excavation, backfill, arbitrary initial stresses, multiple material domains, load incrementation, plotting database storage and access of results, and other features unique to the geologic disposal of radioactive wastes. Numerous verification problems that exercise many of the program options and illustrate the required data input and printed results are included in the documentation

  18. Technical approach document

    International Nuclear Information System (INIS)

    1988-04-01

    This document describes the general technical approaches and design criteria adopted by the US Department of Energy (DOE) in order to implement Remedial Action Plans (RAPs) and final designs that comply with EPS standards. This document is a revision to the original document. Major revisions were made to the sections in riprap selection and sizing, and ground-water; only minor revisions were made to the remainder of the document. The US Nuclear Regulatory Commission (NRC) has prepared a Standard Review Plan (NRC-SRP) which describes factors to be considered by the NRC in approving the RAP. Sections 3.0, 4.0, 5.0, and 7.0 of this document are arranged under the same headings as those used in the NRC-SRP. This approach is adopted in order to facilitate joint use of the documents. Section 2.0 (not included in the NRC-SRP) discusses design considerations; Section 3.0 describes surface-water hydrology and erosion control; Section 4.0 describes geotechnical aspects of pile design; Section 5.0 discusses the Alternate Site Selection Process; Section 6.0 deals with radiological issues (in particular, the design of the radon barrier); Section 7.0 discusses protection of groundwater resources; and Section 8.0 discusses site design criteria for the RAC

  19. Methodological Aspects of Architectural Documentation

    Directory of Open Access Journals (Sweden)

    Arivaldo Amorim

    2011-12-01

    Full Text Available This paper discusses the methodological approach that is being developed in the state of Bahia in Brazil since 2003, in architectural and urban sites documentation, using extensive digital technologies. Bahia has a vast territory with important architectural ensembles ranging from the sixteenth century to present day. As part of this heritage is constructed of raw earth and wood, it is very sensitive to various deleterious agents. It is therefore critical document this collection that is under threats. To conduct those activities diverse digital technologies that could be used in documentation process are being experimented. The task is being developed as an academic research, with few financial resources, by scholarship students and some volunteers. Several technologies are tested ranging from the simplest to the more sophisticated ones, used in the main stages of the documentation project, as follows: work overall planning, data acquisition, processing and management and ultimately, to control and evaluate the work. The activities that motivated this paper are being conducted in the cities of Rio de Contas and Lençóis in the Chapada Diamantina, located at 420 km and 750 km from Salvador respectively, in Cachoeira city at Recôncavo Baiano area, 120 km from Salvador, the capital of Bahia state, and at Pelourinho neighbourhood, located in the historic capital. Part of the material produced can be consulted in the website: < www.lcad.ufba.br>.

  20. Text accessibility by people with reduced contrast sensitivity.

    Science.gov (United States)

    Crossland, Michael D; Rubin, Gary S

    2012-09-01

    Contrast sensitivity is reduced in people with eye disease, and also in older adults without eye disease. In this article, we compare contrast of text presented in print and digital formats with contrast sensitivity values for a large cohort of subjects in a population-based study of older adults (the Salisbury Eye Evaluation). Contrast sensitivity values were recorded for 2520 adults aged 65 to 84 years living in Salisbury, Maryland. The proportion of the sample likely to be unable to read text of different formats (electronic books, newsprint, paperback books, laser print, and LED computer monitors) was calculated using published contrast reserve levels required to perform spot reading, to read with fluency, high fluency, and under optimal conditions. One percent of this sample had contrast sensitivity less than that required to read newsprint fluently. Text presented on an LED computer monitor had the highest contrast. Ninety-eight percent of the sample had contrast sensitivity sufficient for high fluent reading of text (at least 160 words/min) on a monitor. However, 29.6% were still unlikely to be able to read this text with optimal fluency. Reduced contrast of print limits text accessibility for many people in the developed world. Presenting text in a high-contrast format, such as black laser print on a white page, would increase the number of people able to access such information. Additionally, making text available in a format that can be presented on an LED computer monitor will increase access to written documents.

  1. Gaia DR2 documentation

    Science.gov (United States)

    van Leeuwen, F.; de Bruijne, J. H. J.; Arenou, F.; Bakker, J.; Blomme, R.; Busso, G.; Cacciari, C.; Castañeda, J.; Cellino, A.; Clotet, M.; Comoretto, G.; Eyer, L.; González-Núñez, J.; Guy, L.; Hambly, N.; Hobbs, D.; van Leeuwen, M.; Luri, X.; Manteiga, M.; Pourbaix, D.; Roegiers, T.; Salgado, J.; Sartoretti, P.; Tanga, P.; Ulla, A.; Utrilla Molina, E.; Abreu, A.; Altmann, M.; Andrae, R.; Antoja, T.; Audard, M.; Babusiaux, C.; Bailer-Jones, C. A. L.; Barache, C.; Bastian, U.; Beck, M.; Berthier, J.; Bianchi, L.; Biermann, M.; Bombrun, A.; Bossini, D.; Breddels, M.; Brown, A. G. A.; Busonero, D.; Butkevich, A.; Cantat-Gaudin, T.; Carrasco, J. M.; Cheek, N.; Clementini, G.; Creevey, O.; Crowley, C.; David, M.; Davidson, M.; De Angeli, F.; De Ridder, J.; Delbò, M.; Dell'Oro, A.; Diakité, S.; Distefano, E.; Drimmel, R.; Durán, J.; Evans, D. W.; Fabricius, C.; Fabrizio, M.; Fernández-Hernández, J.; Findeisen, K.; Fleitas, J.; Fouesneau, M.; Galluccio, L.; Gracia-Abril, G.; Guerra, R.; Gutiérrez-Sánchez, R.; Helmi, A.; Hernandez, J.; Holl, B.; Hutton, A.; Jean-Antoine-Piccolo, A.; Jevardat de Fombelle, G.; Joliet, E.; Jordi, C.; Juhász, Á.; Klioner, S.; Löffler, W.; Lammers, U.; Lanzafame, A.; Lebzelter, T.; Leclerc, N.; Lecoeur-Taïbi, I.; Lindegren, L.; Marinoni, S.; Marrese, P. M.; Mary, N.; Massari, D.; Messineo, R.; Michalik, D.; Mignard, F.; Molinaro, R.; Molnár, L.; Montegriffo, P.; Mora, A.; Mowlavi, N.; Muinonen, K.; Muraveva, T.; Nienartowicz, K.; Ordenovic, C.; Pancino, E.; Panem, C.; Pauwels, T.; Petit, J.; Plachy, E.; Portell, J.; Racero, E.; Regibo, S.; Reylé, C.; Rimoldini, L.; Ripepi, V.; Riva, A.; Robichon, N.; Robin, A.; Roelens, M.; Romero-Gómez, M.; Sarro, L.; Seabroke, G.; Segovia, J. C.; Siddiqui, H.; Smart, R.; Smith, K.; Sordo, R.; Soria, S.; Spoto, F.; Stephenson, C.; Turon, C.; Vallenari, A.; Veljanoski, J.; Voutsinas, S.

    2018-04-01

    The second Gaia data release, Gaia DR2, encompasses astrometry, photometry, radial velocities, astrophysical parameters (stellar effective temperature, extinction, reddening, radius, and luminosity), and variability information plus astrometry and photometry for a sample of pre-selected bodies in the solar system. The data collected during the first 22 months of the nominal, five-year mission have been processed by the Gaia Data Processing and Analysis Consortium (DPAC), resulting into this second data release. A summary of the release properties is provided in Gaia Collaboration et al. (2018b). The overall scientific validation of the data is described in Arenou et al. (2018). Background information on the mission and the spacecraft can be found in Gaia Collaboration et al. (2016), with a more detailed presentation of the Radial Velocity Spectrometer (RVS) in Cropper et al. (2018). In addition, Gaia DR2 is accompanied by various, dedicated papers that describe the processing and validation of the various data products. Four more Gaia Collaboration papers present a glimpse of the scientific richness of the data. In addition to this set of refereed publications, this documentation provides a detailed, complete overview of the processing and validation of the Gaia DR2 data. Gaia data, from both Gaia DR1 and Gaia DR2, can be retrieved from the Gaia archive, which is accessible from https://archives.esac.esa.int/gaia. The archive also provides various tutorials on data access and data queries plus an integrated data model (i.e., description of the various fields in the data tables). In addition, Luri et al. (2018) provide concrete advice on how to deal with Gaia astrometry, with recommendations on how best to estimate distances from parallaxes. The Gaia archive features an enhanced visualisation service which can be used for quick initial explorations of the entire Gaia DR2 data set. Pre-computed cross matches between Gaia DR2 and a selected set of large surveys are

  2. A Proposed Arabic Handwritten Text Normalization Method

    Directory of Open Access Journals (Sweden)

    Tarik Abu-Ain

    2014-11-01

    Full Text Available Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line.  The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.

  3. Writing Treatment for Aphasia: A Texting Approach

    Science.gov (United States)

    Beeson, Pelagie M.; Higginson, Kristina; Rising, Kindle

    2013-01-01

    Purpose: Treatment studies have documented the therapeutic and functional value of lexical writing treatment for individuals with severe aphasia. The purpose of this study was to determine whether such retraining could be accomplished using the typing feature of a cellular telephone, with the ultimate goal of using text messaging for…

  4. Linguistic dating of biblical texts

    DEFF Research Database (Denmark)

    Young, Ian; Rezetko, Robert; Ehrensvärd, Martin Gustaf

    Since the beginning of critical scholarship biblical texts have been dated using linguistic evidence. In recent years this has become a controversial topic, especially with the publication of Ian Young (ed.), Biblical Hebrew: Studies in Chronology and Typology (2003). However, until now there has...... been no introduction and comprehensive study of the field. Volume 1 introduces the field of linguistic dating of biblical texts, particularly to intermediate and advanced students of biblical Hebrew who have a reasonable background in the language, having completed at least an introductory course...... in this volume are: What is it that makes Archaic Biblical Hebrew archaic , Early Biblical Hebrew early , and Late Biblical Hebrew late ? Does linguistic typology, i.e. different linguistic characteristics, convert easily and neatly into linguistic chronology, i.e. different historical origins? A large amount...

  5. Text mining for the biocuration workflow

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  6. Sparse Machine Learning Methods for Understanding Large Text Corpora

    Data.gov (United States)

    National Aeronautics and Space Administration — Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional data with high degree of interpretability, at low computational...

  7. Document reconstruction by layout analysis of snippets

    Science.gov (United States)

    Kleber, Florian; Diem, Markus; Sablatnig, Robert

    2010-02-01

    Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.

  8. Text Classification and Distributional features techniques in Datamining and Warehousing

    OpenAIRE

    Bethu, Srikanth; Babu, G Charless; Vinoda, J; Priyadarshini, E; rao, M Raghavendra

    2013-01-01

    Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of unimportant words may increase and document may be classified in the wrong category.For reducing the error of classifying of documents in wrong category. The Distributional features are introduced. In the Distribuional Features, the Distribution of the words in ...

  9. New Challenges of the Documentation in Media

    Directory of Open Access Journals (Sweden)

    Antonio García Jiménez

    2015-07-01

    Full Text Available This special issue, presented by index.comunicación, is focused on media related information & documentation. This field undergoes constant and profound changes, especially visible in documentation processes. A situation characterized by the existence of tablets, smartphones, applications, and by the almost achieved digitization of traditional documents, in addition to the crisis of the press business model, that involves mutations in the journalists’ tasks and in the relationship between them and Documentation. Papers included in this special issue focus on some of the concerns in this domain: the progressive autonomy of the journalist in access to information sources, the role of press offices as documentation sources, the search of information on the web, the situation of media blogs, the viability of elements of information architecture in smart TV and the development of social TV and its connection to Documentation.

  10. Wilmar joint market model, Documentation

    International Nuclear Information System (INIS)

    Meibom, P.; Larsen, Helge V.; Barth, R.; Brand, H.; Weber, C.; Voll, O.

    2006-01-01

    The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)

  11. Endangered Language Documentation and Transmission

    Directory of Open Access Journals (Sweden)

    D. Victoria Rau

    2007-01-01

    Full Text Available This paper describes an on-going project on digital archiving Yami language documentation (http://www.hrelp.org/grants/projects/index.php?projid=60. We present a cross-disciplinary approach, involving computer science and applied linguistics, to document the Yami language and prepare teaching materials. Our discussion begins with an introduction to an integrated framework for archiving, processing and developing learning materials for Yami (Yang and Rau 2005, followed by a historical account of Yami language teaching, from a grammatical syllabus (Dong and Rau 2000b to a communicative syllabus using a multimedia CD as a resource (Rau et al. 2005, to the development of interactive on-line learning based on the digital archiving project. We discuss the methods used and challenges of each stage of preparing Yami teaching materials, and present a proposal for rethinking pedagogical models for e-learning.

  12. Wilmar joint market model, Documentation

    Energy Technology Data Exchange (ETDEWEB)

    Meibom, P.; Larsen, Helge V. [Risoe National Lab. (Denmark); Barth, R.; Brand, H. [IER, Univ. of Stuttgart (Germany); Weber, C.; Voll, O. [Univ. of Duisburg-Essen (Germany)

    2006-01-15

    The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)

  13. Customer Communication Document

    Science.gov (United States)

    2009-01-01

    This procedure communicates to the Customers of the Automation, Robotics and Simulation Division (AR&SD) Dynamics Systems Test Branch (DSTB) how to obtain services of the Six-Degrees-Of-Freedom Dynamic Test System (SDTS). The scope includes the major communication documents between the SDTS and its Customer. It established the initial communication and contact points as well as provides the initial documentation in electronic media for the customer. Contact the SDTS Manager (SM) for the names of numbers of the current contact points.

  14. Text mining with R a tidy approach

    CERN Document Server

    Silge, Julia

    2017-01-01

    Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...

  15. Computer-assisted documentation: One device to keep your nose above the water

    International Nuclear Information System (INIS)

    Church, L.B.

    1980-01-01

    Because of the large number of student operators and trainees at the Reed College Reactor Facility, there is a large demand for access to our documentation. In the past the standard mimeograph approach has been used to make available Tech Specs, Administrative Procedures, Emergency Plans, etc., etc. However, the frequency of change in these documents (often relatively minor in nature) causes an entire document to be outdated. To provide easier student access, to help keep the documentation up to date and to do so at a minimum cost, we have started using the text editor on our computer. On the whole the experiment has been very well received; some of the more important pro's and con's will be discussed. (author)

  16. Documents on Disarmament.

    Science.gov (United States)

    Arms Control and Disarmament Agency, Washington, DC.

    This publication, latest in a series of volumes issued annually since 1960, contains primary source documents on arms control and disarmament developments during 1969. The main chronological arrangement is supplemented by both chronological and topical lists of contents. Other reference aids include a subject/author index, and lists of…

  17. ROOT Reference Documentation

    CERN Document Server

    Fuakye, Eric Gyabeng

    2017-01-01

    A ROOT Reference Documentation has been implemented to generate all the lists of libraries needed for each ROOT class. Doxygen has no option to generate or add the lists of libraries for each ROOT class. Therefore shell scripting and a basic C++ program was employed to import the lists of libraries needed by each ROOT class.

  18. Client Oriented Management Documents.

    Science.gov (United States)

    Limaye, Mohan R.; Hightower, Rick

    Noting that accounting reports, including management advisory service (MAS) studies, reports on internal control, and tax memoranda, often appear rather dense and heavy in style--partly because of the legal environment's demand for careful expression and partly because such documents convey very complex information--this paper presents four…

  19. Using Primary Source Documents.

    Science.gov (United States)

    Mintz, Steven

    2003-01-01

    Explores the use of primary sources when teaching about U.S. slavery. Includes primary sources from the Gilder Lehrman Documents Collection (New York Historical Society) to teach about the role of slaves in the Revolutionary War, such as a proclamation from Lord Dunmore offering freedom to slaves who joined his army. (CMK)

  20. QA programme documentation

    International Nuclear Information System (INIS)

    Scheibelt, L.

    1980-01-01

    The present paper deals with the following topics: The need for a documented Q.A. program; Establishing a Q.A. program; Q.A. activities; Fundamental policies; Q.A. policies; Quality objectives Q.A. manual. (orig./RW)

  1. Student Problems with Documentation.

    Science.gov (United States)

    Freimer, Gloria R.; Perry, Margaret M.

    1986-01-01

    Interviews with faculty, a survey of 20 students, and examination of style manuals revealed that students are confused by inconsistencies in and multiplicity of styles when confronted with writing and documenting a research paper. Librarians are urged to teach various citation formats and work for adoption of standardization. (17 references) (EJS)

  2. Documentation of spectrom-41

    International Nuclear Information System (INIS)

    Svalstad, D.K.

    1989-01-01

    SPECTROM-41 is a finite element heat transfer computer program developed to analyze thermal problems related to nuclear waste disposal. The code is part of the SPECTROM (Special Purpose Engineering Codes for Thermal/ROck Mechanics) series of special purpose finite element programs that are continually being developed by RE/SPEC Inc. (RSI) to address the many unique formations. This document presents the theoretical basis for the mathematical model, the finite element formulation of the program, and a description of the input data for the program, along with details about program support and continuing documentation. The documentation is intended to satisfy the requirements and guidelines outlined in NUREG-0856. The principal component model used in the programs based on Fourier's law of conductance. Numerous program options provide the capability of considering various boundary conditions, material stratification and anisotropy, and time-dependent heat generation that are characteristic of problems involving the disposal of nuclear waste in geologic formation. Numerous verification problems are included in the documentation in addition to highlights of past and ongoing verification and validation efforts. A typical repository problem is solving using SPECTROM-41 to demonstrate the use of the program in addressing problems related to the disposal of nuclear waste

  3. Course documentation report

    DEFF Research Database (Denmark)

    Buus, Lillian; Bygholm, Ann; Walther, Tina Dyngby Lyng

    A documentation report on the three pedagogical courses developed during the MVU project period. The report describes the three processes taking departure in the structure and material avaiable at the virtual learning environment. Also the report describes the way the two of the courses developed...

  4. Extremely secure identification documents

    International Nuclear Information System (INIS)

    Tolk, K.M.; Bell, M.

    1997-09-01

    The technology developed in this project uses biometric information printed on the document and public key cryptography to ensure that an adversary cannot issue identification documents to unauthorized individuals or alter existing documents to allow their use by unauthorized individuals. This process can be used to produce many types of identification documents with much higher security than any currently in use. The system is demonstrated using a security badge as an example. This project focused on the technologies requiring development in order to make the approach viable with existing badge printing and laminating technologies. By far the most difficult was the image processing required to verify that the picture on the badge had not been altered. Another area that required considerable work was the high density printed data storage required to get sufficient data on the badge for verification of the picture. The image processing process was successfully tested, and recommendations are included to refine the badge system to ensure high reliability. A two dimensional data array suitable for printing the required data on the badge was proposed, but testing of the readability of the array had to be abandoned due to reallocation of the budgeted funds by the LDRD office

  5. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    Science.gov (United States)

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  6. Domain-independent information extraction in unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

    1996-09-01

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

  7. Technical approach document

    International Nuclear Information System (INIS)

    1989-12-01

    The Uranium Mill Tailings Radiation Control Act (UMTRCA) of 1978, Public Law 95-604 (PL95-604), grants the Secretary of Energy the authority and responsibility to perform such actions as are necessary to minimize radiation health hazards and other environmental hazards caused by inactive uranium mill sites. This Technical Approach Document (TAD) describes the general technical approaches and design criteria adopted by the US Department of Energy (DOE) in order to implement remedial action plans (RAPS) and final designs that comply with EPA standards. It does not address the technical approaches necessary for aquifer restoration at processing sites; a guidance document, currently in preparation, will describe aquifer restoration concerns and technical protocols. This document is a second revision to the original document issued in May 1986; the revision has been made in response to changes to the groundwater standards of 40 CFR 192, Subparts A--C, proposed by EPA as draft standards. New sections were added to define the design approaches and designs necessary to comply with the groundwater standards. These new sections are in addition to changes made throughout the document to reflect current procedures, especially in cover design, water resources protection, and alternate site selection; only minor revisions were made to some of the sections. Sections 3.0 is a new section defining the approach taken in the design of disposal cells; Section 4.0 has been revised to include design of vegetated covers; Section 8.0 discusses design approaches necessary for compliance with the groundwater standards; and Section 9.0 is a new section dealing with nonradiological hazardous constituents. 203 refs., 18 figs., 26 tabs

  8. Text Generation: The State of the Art and the Literature.

    Science.gov (United States)

    Mann, William C.; And Others

    This report comprises two documents which describe the state of the art of computer generation of natural language text. Both were prepared by a panel of individuals who are active in research on text generation. The first document assesses the techniques now available for use in systems design, covering all of the technical methods by which…

  9. Building Background Knowledge through Reading: Rethinking Text Sets

    Science.gov (United States)

    Lupo, Sarah M.; Strong, John Z.; Lewis, William; Walpole, Sharon; McKenna, Michael C.

    2018-01-01

    To increase reading volume and help students access challenging texts, the authors propose a four-dimensional framework for text sets. The quad text set framework is designed around a target text: a challenging content area text, such as a canonical literary work, research article, or historical primary source document. The three remaining…

  10. Features based approach for indexation and representation of unstructured Arabic documents

    Directory of Open Access Journals (Sweden)

    Mohamed Salim El Bazzi

    2017-06-01

    Full Text Available The increase of textual information published in Arabic language on the internet, public libraries and administrations requires implementing effective techniques for the extraction of relevant information contained in large corpus of texts. The purpose of indexing is to create a document representation that easily find and identify the relevant information in a set of documents. However, mining textual data is becoming a complicated task, especially when taking semantic into consideration. In this paper, we will present an indexation system based on contextual representation that will take the advantage of semantic links given in a document. Our approach is based on the extraction of keyphrases. Then, each document is represented by its relevant keyphrases instead of its simple keywords. The experimental results confirms the effectiveness of our approach.

  11. Text

    International Nuclear Information System (INIS)

    Anon.

    2009-01-01

    The purpose of this act is to safeguard against the dangers and harmful effects of radioactive waste and to contribute to public safety and environmental protection by laying down requirements for the safe and efficient management of radioactive waste. We will find definitions, interrelation with other legislation, responsibilities of the state and local governments, responsibilities of radioactive waste management companies and generators, formulation of the basic plan for the control of radioactive waste, radioactive waste management ( with public information, financing and part of spent fuel management), Korea radioactive waste management corporation ( business activities, budget), establishment of a radioactive waste fund in order to secure the financial resources required for radioactive waste management, and penalties in case of improper operation of radioactive waste management. (N.C.)

  12. Computer-Assisted Search Of Large Textual Data Bases

    Science.gov (United States)

    Driscoll, James R.

    1995-01-01

    "QA" denotes high-speed computer system for searching diverse collections of documents including (but not limited to) technical reference manuals, legal documents, medical documents, news releases, and patents. Incorporates previously available and emerging information-retrieval technology to help user intelligently and rapidly locate information found in large textual data bases. Technology includes provision for inquiries in natural language; statistical ranking of retrieved information; artificial-intelligence implementation of semantics, in which "surface level" knowledge found in text used to improve ranking of retrieved information; and relevance feedback, in which user's judgements of relevance of some retrieved documents used automatically to modify search for further information.

  13. Documenting the Earliest Chinese Journals

    Directory of Open Access Journals (Sweden)

    Jian-zhong (Joe Zhou

    2001-10-01

    Full Text Available

    頁次:19-24

    text-indent: 24pt; mso-layout-grid-align: none; mso-char-indent-count: 2.0;">According to various authoritative sources, the English word "journal" was first used in the 16lh century, but the existence of the journal in its original meaning as a daily record can be traced back to Acta Diuma (Daily Events in ancient Roman cities as early as 59 B.C. This article documents the first appearance of Chinese daily records that were much early than 59 B.C.

    text-indent: 24pt; mso-layout-grid-align: none; mso-char-indent-count: 2.0;">The evidence of the earlier Chinese daily records came from some important archaeological discoveries in the 1970's, but they were also documented by Sima Qian (145 B.C. - 85 B.C., the grand historian of the Han Dynasty imperial court. Sima's lifetime contribution was the publication of Shi Ji (史記 (The Grand Scribe's Records, the Records hereafter. The Records is a book of history of a grand scope. It encompasses all Chinese history from 30lh century B.C. through the end of the second century B.C. in 130 chapters and over 525,000 Chinese

  14. SEMANTIC METADATA FOR HETEROGENEOUS SPATIAL PLANNING DOCUMENTS

    Directory of Open Access Journals (Sweden)

    A. Iwaniak

    2016-09-01

    Full Text Available Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa. The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  15. Teaching Text Structure: Examining the Affordances of Children's Informational Texts

    Science.gov (United States)

    Jones, Cindy D.; Clark, Sarah K.; Reutzel, D. Ray

    2016-01-01

    This study investigated the affordances of informational texts to serve as model texts for teaching text structure to elementary school children. Content analysis of a random sampling of children's informational texts from top publishers was conducted on text structure organization and on the inclusion of text features as signals of text…

  16. Documentation of Concurrent programs.

    Science.gov (United States)

    1983-07-01

    preparing the documentation formats, and Tom McDonald for preparing the supplemental materials and statistical analyses. 16 [ 1 -16- I j REFERENCES I Boehm...34*h (eeeeeotop to enter h, to IAmlOCSot land ,uoMXM to R&’T- kfC ’Cod At 1*1 ,lgo: 0) 4 O ~ en ttR .I SA’ tgOhegl that ’t to 40n. .hi &ren a ~ O toll s

  17. SGHWR - quality assurance documentation

    International Nuclear Information System (INIS)

    Garrard, R.S.; Caulfield, J.

    1976-01-01

    The quality assurance program for a modern power station such as an SGHWR type reactor plant must include a record of quality achievement. The case history record which is evidence of the actual quality of the plant and is a data bank of design, manufacture, and results of inspections and tests, is described. Documentation distribution, which keeps all key areas informed of plant item quality status, and the retrieval and storage of information, are briefly discussed. (U.K.)

  18. AUDIT plan documenting method

    International Nuclear Information System (INIS)

    Cornecsu, M.

    1995-01-01

    The work describes a method of documenting the AUDIT plan upon the basis of two quantitative elements resulting from quality assurance program appraisal system function implementation degree as established from the latest AUDIT performed an system function weight in QAP, respectively, appraised by taking into account their significance for the activities that are to be performed in the period for which the AUDITs are planned. (Author) 3 Figs., 2 Refs

  19. AREVA - 2013 Reference document

    International Nuclear Information System (INIS)

    2014-01-01

    This Reference Document contains information on the AREVA group's objectives, prospects and development strategies, as well as estimates of the markets, market shares and competitive position of the AREVA group. Content: 1 - Person responsible for the Reference Document; 2 - Statutory auditors; 3 - Selected financial information; 4 - Description of major risks confronting the company; 5 - Information about the issuer; 6 - Business overview; 7 - Organizational structure; 8 - Property, plant and equipment; 9 - Situation and activities of the company and its subsidiaries; 10 - Capital resources; 11 - Research and development programs, patents and licenses; 12 - Trend information; 13 - Profit forecasts or estimates; 14 - Management and supervisory bodies; 15 - Compensation and benefits; 16 - Functioning of the management and supervisory bodies; 17 - Human resources information; 18 - Principal shareholders; 19 - Transactions with related parties; 20 - Financial information concerning assets, financial positions and financial performance; 21 - Additional information; 22 - Major contracts; 23 - Third party information, statements by experts and declarations of interest; 24 - Documents on display; 25 - Information on holdings; Appendix 1: report of the supervisory board chairman on the preparation and organization of the board's activities and internal control procedures; Appendix 2: statutory auditors' reports; Appendix 3: environmental report; Appendix 4: non-financial reporting methodology and independent third-party report on social, environmental and societal data; Appendix 5: ordinary and extraordinary general shareholders' meeting; Appendix 6: values charter; Appendix 7: table of concordance of the management report; glossaries

  20. Content Documents Management

    Science.gov (United States)

    Muniz, R.; Hochstadt, J.; Boelke J.; Dalton, A.

    2011-01-01

    The Content Documents are created and managed under the System Software group with. Launch Control System (LCS) project. The System Software product group is lead by NASA Engineering Control and Data Systems branch (NEC3) at Kennedy Space Center. The team is working on creating Operating System Images (OSI) for different platforms (i.e. AIX, Linux, Solaris and Windows). Before the OSI can be created, the team must create a Content Document which provides the information of a workstation or server, with the list of all the software that is to be installed on it and also the set where the hardware belongs. This can be for example in the LDS, the ADS or the FR-l. The objective of this project is to create a User Interface Web application that can manage the information of the Content Documents, with all the correct validations and filters for administrator purposes. For this project we used one of the most excellent tools in agile development applications called Ruby on Rails. This tool helps pragmatic programmers develop Web applications with Rails framework and Ruby programming language. It is very amazing to see how a student can learn about OOP features with the Ruby language, manage the user interface with HTML and CSS, create associations and queries with gems, manage databases and run a server with MYSQL, run shell commands with command prompt and create Web frameworks with Rails. All of this in a real world project and in just fifteen weeks!

  1. Toward Documentation of Program Evolution

    DEFF Research Database (Denmark)

    Vestdam, Thomas; Nørmark, Kurt

    2005-01-01

    The documentation of a program often falls behind the evolution of the program source files. When this happens it may be attractive to shift the documentation mode from updating the documentation to documenting the evolution of the program. This paper describes tools that support the documentatio....... It is concluded that our approach can help revitalize older documentation, and that discovery of the fine grained program evolution steps help the programmer in documenting the evolution of the program....

  2. Robust keyword retrieval method for OCRed text

    Science.gov (United States)

    Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu

    2011-01-01

    Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.

  3. Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.

    Science.gov (United States)

    Chen, Z.

    1993-01-01

    Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)

  4. Semantic Metadata for Heterogeneous Spatial Planning Documents

    Science.gov (United States)

    Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

    2016-09-01

    Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  5. Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique

    Directory of Open Access Journals (Sweden)

    Barate Seema

    2016-01-01

    Full Text Available Images contain various types of useful information that should be extracted whenever required. A various algorithms and methods are proposed to extract text from the given image, and by using that user will be able to access the text from any image. Variations in text may occur because of differences in size, style,orientation, alignment of text, and low image contrast, composite backgrounds make the problem during extraction of text. If we develop an application that extracts and recognizes those texts accurately in real time, then it can be applied to many important applications like document analysis, vehicle license plate extraction, text- based image indexing, etc and many applications have become realities in recent years. To overcome the above problems we develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model, blob analysis,template matching, template generation.

  6. Important Text Characteristics for Early-Grades Text Complexity

    Science.gov (United States)

    Fitzgerald, Jill; Elmore, Jeff; Koons, Heather; Hiebert, Elfrieda H.; Bowen, Kimberly; Sanford-Moore, Eleanor E.; Stenner, A. Jackson

    2015-01-01

    The Common Core set a standard for all children to read increasingly complex texts throughout schooling. The purpose of the present study was to explore text characteristics specifically in relation to early-grades text complexity. Three hundred fifty primary-grades texts were selected and digitized. Twenty-two text characteristics were identified…

  7. AREVA 2009 reference document

    International Nuclear Information System (INIS)

    2009-01-01

    This Reference Document contains information on the AREVA group's objectives, prospects and development strategies. It contains information on the markets, market shares and competitive position of the AREVA group. This information provides an adequate picture of the size of these markets and of the AREVA group's competitive position. Content: 1 - Person responsible for the Reference Document and Attestation by the person responsible for the Reference Document; 2 - Statutory and Deputy Auditors; 3 - Selected financial information; 4 - Risks: Risk management and coverage, Legal risk, Industrial and environmental risk, Operating risk, Risk related to major projects, Liquidity and market risk, Other risk; 5 - Information about the issuer: History and development, Investments; 6 - Business overview: Markets for nuclear power and renewable energies, AREVA customers and suppliers, Overview and strategy of the group, Business divisions, Discontinued operations: AREVA Transmission and Distribution; 7 - Organizational structure; 8 - Property, plant and equipment: Principal sites of the AREVA group, Environmental issues that may affect the issuer's; 9 - Analysis of and comments on the group's financial position and performance: Overview, Financial position, Cash flow, Statement of financial position, Events subsequent to year-end closing for 2009; 10 - Capital Resources; 11 - Research and development programs, patents and licenses; 12 -trend information: Current situation, Financial objectives; 13 - Profit forecasts or estimates; 14 - Administrative, management and supervisory bodies and senior management; 15 - Compensation and benefits; 16 - Functioning of corporate bodies; 17 - Employees; 18 - Principal shareholders; 19 - Transactions with related parties: French state, CEA, EDF group; 20 - Financial information concerning assets, financial positions and financial performance; 21 - Additional information: Share capital, Certificate of incorporation and by-laws; 22 - Major

  8. Viviendo el documental

    OpenAIRE

    Álvarez Moreno, Víctor

    2017-01-01

    En el siguiente trabajo se recoge el proceso de realización y elaboración de un documental en 360 grados sobre la catedral de Valladolid bajo el título Reconstruyendo la catedral. El trabajo une realidad virtual con narrativa periodística. La realidad virtual es una herramienta que permite transformar al espectador en un testigo de la historia. En este caso, se muestra lo que pudo ser la catedral de Valladolid, cuyo objetivo era convertirse en la catedral más grande del territorio europeo. ...

  9. VisualUrText: A Text Analytics Tool for Unstructured Textual Data

    Science.gov (United States)

    Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

    2018-05-01

    The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.

  10. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  11. AREVA - 2012 Reference document

    International Nuclear Information System (INIS)

    2013-03-01

    After a presentation of the person responsible for this Reference Document, of statutory auditors, and of a summary of financial information, this report address the different risk factors: risk management and coverage, legal risk, industrial and environmental risk, operational risk, risk related to major projects, liquidity and market risk, and other risks (related to political and economic conditions, to Group's structure, and to human resources). The next parts propose information about the issuer, a business overview (markets for nuclear power and renewable energies, customers and suppliers, group's strategy, operations), a brief presentation of the organizational structure, a presentation of properties, plants and equipment (principal sites, environmental issues which may affect these items), analysis and comments on the group's financial position and performance, a presentation of capital resources, a presentation of research and development activities (programs, patents and licenses), a brief description of financial objectives and profit forecasts or estimates, a presentation of administration, management and supervision bodies, a description of the operation of corporate bodies, an overview of personnel, of principal shareholders, and of transactions with related parties, a more detailed presentation of financial information concerning assets, financial positions and financial performance. Addition information regarding share capital is given, as well as an indication of major contracts, third party information, available documents, and information on holdings

  12. AREVA 2010 Reference document

    International Nuclear Information System (INIS)

    2010-01-01

    After a presentation of the person responsible for this document, and of statutory auditors, this report proposes some selected financial information. Then, it addresses, presents and comments the different risk factors: risk management and coverage, legal risk, industrial and environmental risk, operational risk, risks related to major projects, liquidity and market risk, and other risk. Then, after a presentation of the issuer, it proposes a business overview (markets for nuclear and renewable energies, AREVA customers and suppliers, strategy, activities), a presentation of the organizational structure, a presentation of AREVA properties, plants and equipment (sites, environmental issues), an analysis and comment of the group's financial position and performance, a presentation of its capital resources, an overview of its research and development activities, programs, patents and licenses. It indicates profit forecast and estimates, presents the administrative, management and supervisory bodies, and compensation and benefits amounts, reports of the functioning of corporate bodies. It describes the human resource company policy, indicates the main shareholders and transactions with related parties. It proposes financial information concerning assets, financial positions and financial performance. This document contains its French and its English versions

  13. ExactPack Documentation

    Energy Technology Data Exchange (ETDEWEB)

    Singleton, Robert Jr. [Los Alamos National Laboratory; Israel, Daniel M. [Los Alamos National Laboratory; Doebling, Scott William [Los Alamos National Laboratory; Woods, Charles Nathan [Los Alamos National Laboratory; Kaul, Ann [Los Alamos National Laboratory; Walter, John William Jr [Los Alamos National Laboratory; Rogers, Michael Lloyd [Los Alamos National Laboratory

    2016-05-09

    For code verification, one compares the code output against known exact solutions. There are many standard test problems used in this capacity, such as the Noh and Sedov problems. ExactPack is a utility that integrates many of these exact solution codes into a common API (application program interface), and can be used as a stand-alone code or as a python package. ExactPack consists of python driver scripts that access a library of exact solutions written in Fortran or Python. The spatial profiles of the relevant physical quantities, such as the density, fluid velocity, sound speed, or internal energy, are returned at a time specified by the user. The solution profiles can be viewed and examined by a command line interface or a graphical user interface, and a number of analysis tools and unit tests are also provided. We have documented the physics of each problem in the solution library, and provided complete documentation on how to extend the library to include additional exact solutions. ExactPack’s code architecture makes it easy to extend the solution-code library to include additional exact solutions in a robust, reliable, and maintainable manner.

  14. Regulatory guidance document

    International Nuclear Information System (INIS)

    1994-05-01

    The Office of Civilian Radioactive Waste Management (OCRWM) Program Management System Manual requires preparation of the OCRWM Regulatory Guidance Document (RGD) that addresses licensing, environmental compliance, and safety and health compliance. The document provides: regulatory compliance policy; guidance to OCRWM organizational elements to ensure a consistent approach when complying with regulatory requirements; strategies to achieve policy objectives; organizational responsibilities for regulatory compliance; guidance with regard to Program compliance oversight; and guidance on the contents of a project-level Regulatory Compliance Plan. The scope of the RGD includes site suitability evaluation, licensing, environmental compliance, and safety and health compliance, in accordance with the direction provided by Section 4.6.3 of the PMS Manual. Site suitability evaluation and regulatory compliance during site characterization are significant activities, particularly with regard to the YW MSA. OCRWM's evaluation of whether the Yucca Mountain site is suitable for repository development must precede its submittal of a license application to the Nuclear Regulatory Commission (NRC). Accordingly, site suitability evaluation is discussed in Chapter 4, and the general statements of policy regarding site suitability evaluation are discussed in Section 2.1. Although much of the data and analyses may initially be similar, the licensing process is discussed separately in Chapter 5. Environmental compliance is discussed in Chapter 6. Safety and Health compliance is discussed in Chapter 7

  15. Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

    Directory of Open Access Journals (Sweden)

    Jielan Ding

    2016-09-01

    Full Text Available Purpose: In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS with regard to two levels: journal and country. Design/methodology/approach: Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine document type profile similarity values to compare pairs of publication years within countries. Findings: Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years. Research limitations: The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected. Practical implications: Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account. Originality/value: Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of

  16. Undergraduates' Text Messaging Language and Literacy Skills

    Science.gov (United States)

    Grace, Abbie; Kemp, Nenagh; Martin, Frances Heritage; Parrila, Rauno

    2014-01-01

    Research investigating whether people's literacy skill is being affected by the use of text messaging language has produced largely positive results for children, but mixed results for adults. We asked 150 undergraduate university students in Western Canada and 86 in South Eastern Australia to supply naturalistic text messages and to complete…

  17. Relating interesting quantitative time series patterns with text events and text features

    Science.gov (United States)

    Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.

    2013-12-01

    In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other

  18. Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

    Directory of Open Access Journals (Sweden)

    Darko Brodić

    2010-05-01

    Full Text Available Text line segmentation is an essential stage in off-line optical character recognition (OCR systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.

  19. Text analysis methods, text analysis apparatuses, and articles of manufacture

    Science.gov (United States)

    Whitney, Paul D; Willse, Alan R; Lopresti, Charles A; White, Amanda M

    2014-10-28

    Text analysis methods, text analysis apparatuses, and articles of manufacture are described according to some aspects. In one aspect, a text analysis method includes accessing information indicative of data content of a collection of text comprising a plurality of different topics, using a computing device, analyzing the information indicative of the data content, and using results of the analysis, identifying a presence of a new topic in the collection of text.

  20. Classroom Texting in College Students

    Science.gov (United States)

    Pettijohn, Terry F.; Frazier, Erik; Rieser, Elizabeth; Vaughn, Nicholas; Hupp-Wilds, Bobbi

    2015-01-01

    A 21-item survey on texting in the classroom was given to 235 college students. Overall, 99.6% of students owned a cellphone and 98% texted daily. Of the 138 students who texted in the classroom, most texted friends or significant others, and indicate the reason for classroom texting is boredom or work. Students who texted sent a mean of 12.21…

  1. Areva - 2016 Reference document

    International Nuclear Information System (INIS)

    2017-01-01

    Areva supplies high added-value products and services to support the operation of the global nuclear fleet. The company is present throughout the entire nuclear cycle, from uranium mining to used fuel recycling, including nuclear reactor design and operating services. Areva is recognized by utilities around the world for its expertise, its skills in cutting-edge technologies and its dedication to the highest level of safety. Areva's 36,000 employees are helping build tomorrow's energy model: supplying ever safer, cleaner and more economical energy to the greatest number of people. This Reference Document contains information on Areva's objectives, prospects and development strategies. It contains estimates of the markets, market shares and competitive position of Areva

  2. Working document dispersion models

    International Nuclear Information System (INIS)

    Dop, H. van

    1988-01-01

    This report is a summary of the most important results from June 1985 of the collaboration of the RIVM (Dutch National Institute for Public Health and Environment Hygiene) and KNMI (Royal Dutch Meteorologic Institute) on the domain of dispersion models. It contains a short description of the actual SO x /NO x -model. Furthermore it contains recommendations for modifications of some numerical-mathematical aspects and an impulse to a more complete description of chemical processes in the atmosphere and the (wet) deposition process. A separate chapter is devoted to the preparation of meteorologic data which are relevant for dispersion as well as atmospheric chemistry and deposition. This report serves as working document for the final formulation of a acidifying- and oxidant-model. (H.W.). 69 refs.; 51 figs.; 13 tabs.; 3 schemes

  3. Integrated criteria document mercury

    International Nuclear Information System (INIS)

    Sloof, W.; Beelan, P. van; Annema, J.A.; Janus, J.A.

    1995-01-01

    The document contains a systematic review and a critical evaluation of the most relevant data on the priority substance mercury for the purpose of effect-oriented environmental policy. Chapter headings are: properties and existing standards; production, application, sources and emissions (natural sources, industry, energy, households, agriculture, dental use, waste); distribution and transformation (cinnabar; Hg 2+ , Hg 2 2+ , elemental mercury, methylmercury, behavior in soil, water, air, biota); concentrations and fluxes in the environment and exposure levels (sampling and measuring methods, occurrence in soil, water, air etc.); effects (toxicity to humans and aquatic and terrestrial systems); emissions reduction (from industrial sources, energy, waste processing etc.); and evaluation (risks, standards, emission reduction objectives, measuring strategies). 395 refs

  4. Wind system documentation

    Energy Technology Data Exchange (ETDEWEB)

    Froggatt, J.R.; Tatum, C.P.

    1993-01-15

    Atmospheric transport and diffusion models have been developed by the Environmental Technology Section (ETS) of the Savannah River Technology Center to calculate the location and concentration of toxic or radioactive materials during an accidental release at the Savannah River Site (SRS). The output from these models has been used to support initial on-site and off-site emergency response activities such as protective action decision making and field monitoring coordination. These atmospheric transport and diffusion models have been incorporated into an automated computer-based system called the (Weather Information and Display) System and linked to real-time meteorological and radiological monitoring instruments to provide timely information for these emergency response activities (Hunter, 1990). This report documents various aspects of the WIND system.

  5. ICRS Recommendation Document

    DEFF Research Database (Denmark)

    Roos, Ewa M.; Engelhart, Luella; Ranstam, Jonas

    2011-01-01

    and function evaluated for validity and psychometric properties in patients with articular cartilage lesions. Results: The knee-specific instruments, titled the International Knee Documentation Committee Subjective Knee Form and the Knee injury and Osteoarthritis and Outcome Score, both fulfill the basic......Abstract Objective: The purpose of this article is to describe and recommend patient-reported outcome instruments for use in patients with articular cartilage lesions undergoing cartilage repair interventions. Methods: Nonsystematic literature search identifying measures addressing pain...... constructs at all levels according to the International Classification of Functioning. Conclusions: Because there is no obvious superiority of either instrument at this time, both outcome measures are recommended for use in cartilage repair. Rescaling of the Lysholm Scoring Scale has been suggested...

  6. Extractive Summarisation of Medical Documents

    Directory of Open Access Journals (Sweden)

    Abeed Sarker

    2012-09-01

    Full Text Available Background Evidence Based Medicine (EBM practice requires practitioners to extract evidence from published medical research when answering clinical queries. Due to the time-consuming nature of this practice, there is a strong motivation for systems that can automatically summarise medical documents and help practitioners find relevant information. Aim The aim of this work is to propose an automatic query-focused, extractive summarisation approach that selects informative sentences from medical documents. MethodWe use a corpus that is specifically designed for summarisation in the EBM domain. We use approximately half the corpus for deriving important statistics associated with the best possible extractive summaries. We take into account factors such as sentence position, length, sentence content, and the type of the query posed. Using the statistics from the first set, we evaluate our approach on a separate set. Evaluation of the qualities of the generated summaries is performed automatically using ROUGE, which is a popular tool for evaluating automatic summaries. Results Our summarisation approach outperforms all baselines (best baseline score: 0.1594; our score 0.1653. Further improvements are achieved when query types are taken into account. Conclusion The quality of extractive summarisation in the medical domain can be significantly improved by incorporating domain knowledge and statistics derived from a specialised corpus. Such techniques can therefore be applied for content selection in end-to-end summarisation systems.

  7. Areva, reference document 2006

    International Nuclear Information System (INIS)

    2006-01-01

    This reference document contains information on the AREVA group's objectives, prospects and development strategies, particularly in Chapters 4 and 7. It contains information on the markets, market shares and competitive position of the AREVA group. Content: - 1 Person responsible for the reference document and persons responsible for auditing the financial statements; - 2 Information pertaining to the transaction (Not applicable); - 3 General information on the company and its share capital: Information on AREVA, on share capital and voting rights, Investment certificate trading, Dividends, Organization chart of AREVA group companies, Equity interests, Shareholders' agreements; - 4 Information on company operations, new developments and future prospects: Overview and strategy of the AREVA group, The Nuclear Power and Transmission and Distribution markets, The energy businesses of the AREVA group, Front End division, Reactors and Services division, Back End division, Transmission and Distribution division, Major contracts, The principal sites of the AREVA group, AREVA's customers and suppliers, Sustainable Development and Continuous Improvement, Capital spending programs, Research and development programs, intellectual property and trademarks, Risk and insurance; - 5 Assets - Financial position - Financial performance: Analysis of and comments on the group's financial position and performance, 2006 Human Resources Report, Environmental Report, Consolidated financial statements, Notes to the consolidated financial statements, AREVA SA financial statements, Notes to the corporate financial statements; 6 - Corporate Governance: Composition and functioning of corporate bodies, Executive compensation, Profit-sharing plans, AREVA Values Charter, Annual Combined General Meeting of Shareholders of May 3, 2007; 7 - Recent developments and future prospects: Events subsequent to year-end closing for 2006, Outlook; 8 - Glossary; 9 - Table of concordance

  8. Areva - 2014 Reference document

    International Nuclear Information System (INIS)

    2015-01-01

    Areva supplies high added-value products and services to support the operation of the global nuclear fleet. The company is present throughout the entire nuclear cycle, from uranium mining to used fuel recycling, including nuclear reactor design and operating services. Areva is recognized by utilities around the world for its expertise, its skills in cutting-edge technologies and its dedication to the highest level of safety. Areva's 44,000 employees are helping build tomorrow's energy model: supplying ever safer, cleaner and more economical energy to the greatest number of people. This Reference Document contains information on Areva's objectives, prospects and development strategies. It contains estimates of the markets, market shares and competitive position of Areva. Contents: 1 - Person responsible; 2 - Statutory auditors; 3 - Selected financial information; 4 - Risk factors; 5 - Information about the issuer; 6 - Business overview; 7 - Organizational structure; 8 - Property, plant and equipment; 9 - Analysis of and comments on the group's financial position and performance; 10 - Capital resources; 11 - Research and development programs, patents and licenses; 12 - Trend information; 13 - Profit forecasts; 14 - Administrative, management and supervisory bodies and senior management; 15 - Compensation and benefits; 16 - Functioning of administrative, management and supervisory bodies and senior management; 17 - Employees; 18 - Principal shareholders; 19 - Transactions with related parties; 20 - Financial information concerning assets, financial positions and financial performance; 21 - Additional information; 22 - Major contracts; 23 - Third party information, statements by experts and declarations of interest; 24 - Documents on display; 25 - information on holdings; appendix: Report of the Chairman of the Board of Directors on governance, internal control procedures and risk management, Statutory Auditors' report, Corporate social

  9. Areva reference document 2007

    International Nuclear Information System (INIS)

    2008-01-01

    This reference document contains information on the AREVA group's objectives, prospects and development strategies, particularly in Chapters 4 and 7. It contains also information on the markets, market shares and competitive position of the AREVA group. Content: 1 - Person responsible for the reference document and persons responsible for auditing the financial statements; 2 - Information pertaining to the transaction (not applicable); 3 - General information on the company and its share capital: Information on Areva, Information on share capital and voting rights, Investment certificate trading, Dividends, Organization chart of AREVA group companies, Equity interests, Shareholders' agreements; 4 - Information on company operations, new developments and future prospects: Overview and strategy of the AREVA group, The Nuclear Power and Transmission and Distribution markets, The energy businesses of the AREVA group, Front End division, Reactors and Services division, Back End division, Transmission and Distribution division, Major contracts 140 Principal sites of the AREVA group, AREVA's customers and suppliers, Sustainable Development and Continuous Improvement, Capital spending programs, Research and Development programs, Intellectual Property and Trademarks, Risk and insurance; 5 - Assets financial position financial performance: Analysis of and comments on the group's financial position and performance, Human Resources report, Environmental report, Consolidated financial statements 2007, Notes to the consolidated financial statements, Annual financial statements 2007, Notes to the corporate financial statements; 6 - Corporate governance: Composition and functioning of corporate bodies, Executive compensation, Profit-sharing plans, AREVA Values Charter, Annual Ordinary General Meeting of Shareholders of April 17, 2008; 7 - Recent developments and future prospects: Events subsequent to year-end closing for 2007, Outlook; Glossary; table of concordance

  10. Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

    Directory of Open Access Journals (Sweden)

    Emilio Granell

    2018-01-01

    Full Text Available The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs. They show that sub-lexical units outperform word units in terms of Word Error Rate (WER, Character Error Rate (CER and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs and Convolutional Recurrent Neural Nets (CRNNs. Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.

  11. An Integrated Multimedia Approach to Cultural Heritage e-Documents

    NARCIS (Netherlands)

    Smeulders, A.W.M.; Hardman, H.L.; Schreiber, G.; Geusebroek, J.M.

    2002-01-01

    We discuss access to e-documents from three different perspectives beyond the plain keyword web-search of the entire document. The first one is the situation-depending delivery of multimedia documents adapting the preferred form (picture, text, speech) to the available information capacity or need

  12. Observation of [Formula: see text] and [Formula: see text] decays.

    Science.gov (United States)

    Aaij, R; Adeva, B; Adinolfi, M; Ajaltouni, Z; Akar, S; Albrecht, J; Alessio, F; Alexander, M; Ali, S; Alkhazov, G; Alvarez Cartelle, P; Alves, A A; Amato, S; Amerio, S; Amhis, Y; An, L; Anderlini, L; Andreassi, G; Andreotti, M; Andrews, J E; Appleby, R B; Archilli, F; d'Argent, P; Arnau Romeu, J; Artamonov, A; Artuso, M; Aslanides, E; Auriemma, G; Baalouch, M; Babuschkin, I; Bachmann, S; Back, J J; Badalov, A; Baesso, C; Baker, S; Baldini, W; Barlow, R J; Barschel, C; Barsuk, S; Barter, W; Baszczyk, M; Batozskaya, V; Batsukh, B; Battista, V; Bay, A; Beaucourt, L; Beddow, J; Bedeschi, F; Bediaga, I; Bel, L J; Bellee, V; Belloli, N; Belous, K; Belyaev, I; Ben-Haim, E; Bencivenni, G; Benson, S; Benton, J; Berezhnoy, A; Bernet, R; Bertolin, A; Betancourt, C; Betti, F; Bettler, M-O; van Beuzekom, M; Bezshyiko, Ia; Bifani, S; Billoir, P; Bird, T; Birnkraut, A; Bitadze, A; Bizzeti, A; Blake, T; Blanc, F; Blouw, J; Blusk, S; Bocci, V; Boettcher, T; Bondar, A; Bondar, N; Bonivento, W; Bordyuzhin, I; Borgheresi, A; Borghi, S; Borisyak, M; Borsato, M; Bossu, F; Boubdir, M; Bowcock, T J V; Bowen, E; Bozzi, C; Braun, S; Britsch, M; Britton, T; Brodzicka, J; Buchanan, E; Burr, C; Bursche, A; Buytaert, J; Cadeddu, S; Calabrese, R; Calvi, M; Calvo Gomez, M; Camboni, A; Campana, P; Campora Perez, D H; Capriotti, L; Carbone, A; Carboni, G; Cardinale, R; Cardini, A; Carniti, P; Carson, L; Carvalho Akiba, K; Casse, G; Cassina, L; Castillo Garcia, L; Cattaneo, M; Cauet, Ch; Cavallero, G; Cenci, R; Charles, M; Charpentier, Ph; Chatzikonstantinidis, G; Chefdeville, M; Chen, S; Cheung, S-F; Chobanova, V; Chrzaszcz, M; Cid Vidal, X; Ciezarek, G; Clarke, P E L; Clemencic, M; Cliff, H V; Closier, J; Coco, V; Cogan, J; Cogneras, E; Cogoni, V; Cojocariu, L; Collazuol, G; Collins, P; Comerma-Montells, A; Contu, A; Cook, A; Coombs, G; Coquereau, S; Corti, G; Corvo, M; Costa Sobral, C M; Couturier, B; Cowan, G A; Craik, D C; Crocombe, A; Cruz Torres, M; Cunliffe, S; Currie, R; D'Ambrosio, C; Da Cunha Marinho, F; Dall'Occo, E; Dalseno, J; David, P N Y; Davis, A; De Aguiar Francisco, O; De Bruyn, K; De Capua, S; De Cian, M; De Miranda, J M; De Paula, L; De Serio, M; De Simone, P; Dean, C-T; Decamp, D; Deckenhoff, M; Del Buono, L; Demmer, M; Dendek, A; Derkach, D; Deschamps, O; Dettori, F; Dey, B; Di Canto, A; Dijkstra, H; Dordei, F; Dorigo, M; Dosil Suárez, A; Dovbnya, A; Dreimanis, K; Dufour, L; Dujany, G; Dungs, K; Durante, P; Dzhelyadin, R; Dziurda, A; Dzyuba, A; Déléage, N; Easo, S; Ebert, M; Egede, U; Egorychev, V; Eidelman, S; Eisenhardt, S; Eitschberger, U; Ekelhof, R; Eklund, L; Ely, S; Esen, S; Evans, H M; Evans, T; Falabella, A; Farley, N; Farry, S; Fay, R; Fazzini, D; Ferguson, D; Fernandez Prieto, A; Ferrari, F; Ferreira Rodrigues, F; Ferro-Luzzi, M; Filippov, S; Fini, R A; Fiore, M; Fiorini, M; Firlej, M; Fitzpatrick, C; Fiutowski, T; Fleuret, F; Fohl, K; Fontana, M; Fontanelli, F; Forshaw, D C; Forty, R; Franco Lima, V; Frank, M; Frei, C; Fu, J; Furfaro, E; Färber, C; Gallas Torreira, A; Galli, D; Gallorini, S; Gambetta, S; Gandelman, M; Gandini, P; Gao, Y; Garcia Martin, L M; García Pardiñas, J; Garra Tico, J; Garrido, L; Garsed, P J; Gascon, D; Gaspar, C; Gavardi, L; Gazzoni, G; Gerick, D; Gersabeck, E; Gersabeck, M; Gershon, T; Ghez, Ph; Gianì, S; Gibson, V; Girard, O G; Giubega, L; Gizdov, K; Gligorov, V V; Golubkov, D; Golutvin, A; Gomes, A; Gorelov, I V; Gotti, C; Govorkova, E; Grabalosa Gándara, M; Graciani Diaz, R; Granado Cardoso, L A; Graugés, E; Graverini, E; Graziani, G; Grecu, A; Griffith, P; Grillo, L; Gruberg Cazon, B R; Grünberg, O; Gushchin, E; Guz, Yu; Gys, T; Göbel, C; Hadavizadeh, T; Hadjivasiliou, C; Haefeli, G; Haen, C; Haines, S C; Hall, S; Hamilton, B; Han, X; Hansmann-Menzemer, S; Harnew, N; Harnew, S T; Harrison, J; Hatch, M; He, J; Head, T; Heister, A; Hennessy, K; Henrard, P; Henry, L; Hernando Morata, J A; van Herwijnen, E; Heß, M; Hicheur, A; Hill, D; Hombach, C; Hopchev, H; Hulsbergen, W; Humair, T; Hushchyn, M; Hussain, N; Hutchcroft, D; Idzik, M; Ilten, P; Jacobsson, R; Jaeger, A; Jalocha, J; Jans, E; Jawahery, A; Jiang, F; John, M; Johnson, D; Jones, C R; Joram, C; Jost, B; Jurik, N; Kandybei, S; Kanso, W; Karacson, M; Kariuki, J M; Karodia, S; Kecke, M; Kelsey, M; Kenyon, I R; Kenzie, M; Ketel, T; Khairullin, E; Khanji, B; Khurewathanakul, C; Kirn, T; Klaver, S; Klimaszewski, K; Koliiev, S; Kolpin, M; Komarov, I; Koopman, R F; Koppenburg, P; Kosmyntseva, A; Kozachuk, A; Kozeiha, M; Kravchuk, L; Kreplin, K; Kreps, M; Krokovny, P; Kruse, F; Krzemien, W; Kucewicz, W; Kucharczyk, M; Kudryavtsev, V; Kuonen, A K; Kurek, K; Kvaratskheliya, T; Lacarrere, D; Lafferty, G; Lai, A; Lanfranchi, G; Langenbruch, C; Latham, T; Lazzeroni, C; Le Gac, R; van Leerdam, J; Lees, J-P; Leflat, A; Lefrançois, J; Lefèvre, R; Lemaitre, F; Lemos Cid, E; Leroy, O; Lesiak, T; Leverington, B; Li, Y; Likhomanenko, T; Lindner, R; Linn, C; Lionetto, F; Liu, B; Liu, X; Loh, D; Longstaff, I; Lopes, J H; Lucchesi, D; Lucio Martinez, M; Luo, H; Lupato, A; Luppi, E; Lupton, O; Lusiani, A; Lyu, X; Machefert, F; Maciuc, F; Maev, O; Maguire, K; Malde, S; Malinin, A; Maltsev, T; Manca, G; Mancinelli, G; Manning, P; Maratas, J; Marchand, J F; Marconi, U; Marin Benito, C; Marino, P; Marks, J; Martellotti, G; Martin, M; Martinelli, M; Martinez Santos, D; Martinez Vidal, F; Martins Tostes, D; Massacrier, L M; Massafferri, A; Matev, R; Mathad, A; Mathe, Z; Matteuzzi, C; Mauri, A; Maurin, B; Mazurov, A; McCann, M; McCarthy, J; McNab, A; McNulty, R; Meadows, B; Meier, F; Meissner, M; Melnychuk, D; Merk, M; Merli, A; Michielin, E; Milanes, D A; Minard, M-N; Mitzel, D S; Mogini, A; Molina Rodriguez, J; Monroy, I A; Monteil, S; Morandin, M; Morawski, P; Mordà, A; Morello, M J; Moron, J; Morris, A B; Mountain, R; Muheim, F; Mulder, M; Mussini, M; Müller, D; Müller, J; Müller, K; Müller, V; Naik, P; Nakada, T; Nandakumar, R; Nandi, A; Nasteva, I; Needham, M; Neri, N; Neubert, S; Neufeld, N; Neuner, M; Nguyen, A D; Nguyen, T D; Nguyen-Mau, C; Nieswand, S; Niet, R; Nikitin, N; Nikodem, T; Novoselov, A; O'Hanlon, D P; Oblakowska-Mucha, A; Obraztsov, V; Ogilvy, S; Oldeman, R; Onderwater, C J G; Otalora Goicochea, J M; Otto, A; Owen, P; Oyanguren, A; Pais, P R; Palano, A; Palombo, F; Palutan, M; Panman, J; Papanestis, A; Pappagallo, M; Pappalardo, L L; Parker, W; Parkes, C; Passaleva, G; Pastore, A; Patel, G D; Patel, M; Patrignani, C; Pearce, A; Pellegrino, A; Penso, G; Pepe Altarelli, M; Perazzini, S; Perret, P; Pescatore, L; Petridis, K; Petrolini, A; Petrov, A; Petruzzo, M; Picatoste Olloqui, E; Pietrzyk, B; Pikies, M; Pinci, D; Pistone, A; Piucci, A; Playfer, S; Plo Casasus, M; Poikela, T; Polci, F; Poluektov, A; Polyakov, I; Polycarpo, E; Pomery, G J; Popov, A; Popov, D; Popovici, B; Poslavskii, S; Potterat, C; Price, E; Price, J D; Prisciandaro, J; Pritchard, A; Prouve, C; Pugatch, V; Puig Navarro, A; Punzi, G; Qian, W; Quagliani, R; Rachwal, B; Rademacker, J H; Rama, M; Ramos Pernas, M; Rangel, M S; Raniuk, I; Ratnikov, F; Raven, G; Redi, F; Reichert, S; Dos Reis, A C; Remon Alepuz, C; Renaudin, V; Ricciardi, S; Richards, S; Rihl, M; Rinnert, K; Rives Molina, V; Robbe, P; Rodrigues, A B; Rodrigues, E; Rodriguez Lopez, J A; Rodriguez Perez, P; Rogozhnikov, A; Roiser, S; Rollings, A; Romanovskiy, V; Romero Vidal, A; Ronayne, J W; Rotondo, M; Rudolph, M S; Ruf, T; Ruiz Valls, P; Saborido Silva, J J; Sadykhov, E; Sagidova, N; Saitta, B; Salustino Guimaraes, V; Sanchez Mayordomo, C; Sanmartin Sedes, B; Santacesaria, R; Santamarina Rios, C; Santimaria, M; Santovetti, E; Sarti, A; Satriano, C; Satta, A; Saunders, D M; Savrina, D; Schael, S; Schellenberg, M; Schiller, M; Schindler, H; Schlupp, M; Schmelling, M; Schmelzer, T; Schmidt, B; Schneider, O; Schopper, A; Schubert, K; Schubiger, M; Schune, M-H; Schwemmer, R; Sciascia, B; Sciubba, A; Semennikov, A; Sergi, A; Serra, N; Serrano, J; Sestini, L; Seyfert, P; Shapkin, M; Shapoval, I; Shcheglov, Y; Shears, T; Shekhtman, L; Shevchenko, V; Siddi, B G; Silva Coutinho, R; Silva de Oliveira, L; Simi, G; Simone, S; Sirendi, M; Skidmore, N; Skwarnicki, T; Smith, E; Smith, I T; Smith, J; Smith, M; Snoek, H; Sokoloff, M D; Soler, F J P; Souza De Paula, B; Spaan, B; Spradlin, P; Sridharan, S; Stagni, F; Stahl, M; Stahl, S; Stefko, P; Stefkova, S; Steinkamp, O; Stemmle, S; Stenyakin, O; Stevenson, S; Stoica, S; Stone, S; Storaci, B; Stracka, S; Straticiuc, M; Straumann, U; Sun, L; Sutcliffe, W; Swientek, K; Syropoulos, V; Szczekowski, M; Szumlak, T; T'Jampens, S; Tayduganov, A; Tekampe, T; Tellarini, G; Teubert, F; Thomas, E; van Tilburg, J; Tilley, M J; Tisserand, V; Tobin, M; Tolk, S; Tomassetti, L; Tonelli, D; Topp-Joergensen, S; Toriello, F; Tournefier, E; Tourneur, S; Trabelsi, K; Traill, M; Tran, M T; Tresch, M; Trisovic, A; Tsaregorodtsev, A; Tsopelas, P; Tully, A; Tuning, N; Ukleja, A; Ustyuzhanin, A; Uwer, U; Vacca, C; Vagnoni, V; Valassi, A; Valat, S; Valenti, G; Vallier, A; Vazquez Gomez, R; Vazquez Regueiro, P; Vecchi, S; van Veghel, M; Velthuis, J J; Veltri, M; Veneziano, G; Venkateswaran, A; Vernet, M; Vesterinen, M; Viaud, B; Vieira, D; Vieites Diaz, M; Viemann, H; Vilasis-Cardona, X; Vitti, M; Volkov, V; Vollhardt, A; Voneki, B; Vorobyev, A; Vorobyev, V; Voß, C; de Vries, J A; Vázquez Sierra, C; Waldi, R; Wallace, C; Wallace, R; Walsh, J; Wang, J; Ward, D R; Wark, H M; Watson, N K; Websdale, D; Weiden, A; Whitehead, M; Wicht, J; Wilkinson, G; Wilkinson, M; Williams, M; Williams, M P; Williams, M; Williams, T; Wilson, F F; Wimberley, J; Wishahi, J; Wislicki, W; Witek, M; Wormser, G; Wotton, S A; Wraight, K; Wyllie, K; Xie, Y; Xing, Z; Xu, Z; Yang, Z; Yin, H; Yu, J; Yuan, X; Yushchenko, O; Zarebski, K A; Zavertyaev, M; Zhang, L; Zhang, Y; Zhang, Y; Zhelezov, A; Zheng, Y; Zhokhov, A; Zhu, X; Zhukov, V; Zucchelli, S

    2017-01-01

    The decays [Formula: see text] and [Formula: see text] are observed for the first time using a data sample corresponding to an integrated luminosity of 3.0 fb[Formula: see text], collected by the LHCb experiment in proton-proton collisions at the centre-of-mass energies of 7 and 8[Formula: see text]. The branching fractions relative to that of [Formula: see text] are measured to be [Formula: see text]where the first uncertainties are statistical and the second are systematic.

  13. Text-mining analysis of mHealth research

    Science.gov (United States)

    Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical

  14. Text-mining analysis of mHealth research.

    Science.gov (United States)

    Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions

  15. Digital watermarks in electronic document circulation

    Directory of Open Access Journals (Sweden)

    Vitaliy Grigorievich Ivanenko

    2017-07-01

    Full Text Available This paper reviews different protection methods for electronic documents, their good and bad qualities. Common attacks on electronic documents are analyzed. Digital signature and ways of eliminating its flaws are studied. Different digital watermark embedding methods are described, they are divided into 2 types. The solution to protection of electronic documents is based on embedding digital watermarks. Comparative analysis of this methods is given. As a result, the most convenient method is suggested – reversible data hiding. It’s remarked that this technique excels at securing the integrity of the container and its digital watermark. Digital watermark embedding system should prevent illegal access to the digital watermark and its container. Digital watermark requirements for electronic document protection are produced. Legal aspect of copyright protection is reviewed. Advantages of embedding digital watermarks in electronic documents are produced. Modern reversible data hiding techniques are studied. Distinctive features of digital watermark use in Russia are highlighted. Digital watermark serves as an additional layer of defense, that is in most cases unknown to the violator. With an embedded digital watermark, it’s impossible to misappropriate the authorship of the document, even if the intruder signs his name on it. Therefore, digital watermarks can act as an effective additional tool to protect electronic documents.

  16. The Medline/full-text research project.

    Science.gov (United States)

    McKinin, E J; Sievert, M; Johnson, E D; Mitchell, J A

    1991-05-01

    This project was designed to test the relative efficacy of index terms and full-text for the retrieval of documents in those MEDLINE journals for which full-text searching was also available. The full-text files used were MEDIS from Mead Data Central and CCML from BRS Information Technologies. One hundred clinical medical topics were searched in these two files as well as the MEDLINE file to accumulate the necessary data. It was found that full-text identified significantly more relevant articles than did the indexed file, MEDLINE. The full-text searches, however, lacked the precision of searches done in the indexed file. Most relevant items missed in the full-text files, but identified in MEDLINE, were missed because the searcher failed to account for some aspect of natural language, used a logical or positional operator that was too restrictive, or included a concept which was implied, but not expressed in the natural language. Very few of the unique relevant full-text citations would have been retrieved by title or abstract alone. Finally, as of July, 1990 the more current issue of a journal was just as likely to appear in MEDLINE as in one of the full-text files.

  17. Securing Document Warehouses against Brute Force Query Attacks

    Directory of Open Access Journals (Sweden)

    Sergey Vladimirovich Zapechnikov

    2017-04-01

    Full Text Available The paper presents the scheme of data management and protocols for securing document collection against adversary users who try to abuse their access rights to find out the full content of confidential documents. The configuration of secure document retrieval system is described and a suite of protocols among the clients, warehouse server, audit server and database management server is specified. The scheme makes it infeasible for clients to establish correspondence between the documents relevant to different search queries until a moderator won’t give access to these documents. The proposed solution allows ensuring higher security level for document warehouses.

  18. Tank waste remediation system functions and requirements document

    International Nuclear Information System (INIS)

    Carpenter, K.E

    1996-01-01

    This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle

  19. Tank waste remediation system functions and requirements document

    Energy Technology Data Exchange (ETDEWEB)

    Carpenter, K.E

    1996-10-03

    This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle.

  20. Does pedagogical documentation support maternal reminiscing conversations?

    Directory of Open Access Journals (Sweden)

    Bethany Fleck

    2015-12-01

    Full Text Available When parents talk with their children about lessons learned in school, they are participating in reminiscing of an unshared event. This study sought to understand if pedagogical documentation, from the Reggio Approach to early childhood education, would support and enhance the conversation. Mother–child dyads reminisced two separate times about preschool lessons, one time with documentation available to them and one time without. Transcripts were coded extracting variables indicative of high and low maternal reminiscing styles. Results indicate that mother and child conversation characteristics were more highly elaborative when documentation was present than when it was not. In addition, children added more information to the conversation supporting the notion that such conversations enhanced memory for lessons. Documentation could be used as a support tool for conversations and children’s memory about lessons learned in school.

  1. From Text to Political Positions: Text analysis across disciplines

    NARCIS (Netherlands)

    Kaal, A.R.; Maks, I.; van Elfrinkhof, A.M.E.

    2014-01-01

    ABSTRACT From Text to Political Positions addresses cross-disciplinary innovation in political text analysis for party positioning. Drawing on political science, computational methods and discourse analysis, it presents a diverse collection of analytical models including pure quantitative and

  2. OVERLAPPING VIRTUAL CADASTRAL DOCUMENTATION

    Directory of Open Access Journals (Sweden)

    Madalina - Cristina Marian

    2013-12-01

    Full Text Available Two cadastrale plans of buildings, can overlap virtual. Overlap is highlighted when digital reception. According to Law no. 7/1996 as amended and supplemented, to solve these problems is by updating the database graphs, the repositioning. This paper addresses the issue of overlapping virtual cadastre in the history of the period 1999-2012.

  3. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1962-04-10

    The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency.

  4. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1962-01-01

    The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency

  5. Survey Report – State Of The Art In Digital Steganography Focusing ASCII Text Documents

    OpenAIRE

    Khan Farhan Rafat; Muhammad Sher

    2010-01-01

    Digitization of analogue signals has opened up new avenues for information hiding and the recent advancements in the telecommunication field has taken up this desire even further. From copper wire to fiber optics, technology has evolved and so are ways of covert channel communication. By “Covert” we mean “anything not meant for the purpose for which it is being used”. Investigation and detection of existence of such cover channel communication has always remained a serious concern of informat...

  6. State Of The Art In Digital Steganography Focusing ASCII Text Documents

    OpenAIRE

    Rafat, Khan Farhan; Sher, Muhammad

    2010-01-01

    Digitization of analogue signals has opened up new avenues for information hiding and the recent advancements in the telecommunication field has taken up this desire even further. From copper wire to fiber optics, technology has evolved and so are ways of covert channel communication. By "Covert" we mean "anything not meant for the purpose for which it is being used". Investigation and detection of existence of such cover channel communication has always remained a serious concern of informat...

  7. Annotating image ROIs with text descriptions for multimodal biomedical document retrieval

    Science.gov (United States)

    You, Daekeun; Simpson, Matthew; Antani, Sameer; Demner-Fushman, Dina; Thoma, George R.

    2013-01-01

    Regions of interest (ROIs) that are pointed to by overlaid markers (arrows, asterisks, etc.) in biomedical images are expected to contain more important and relevant information than other regions for biomedical article indexing and retrieval. We have developed several algorithms that localize and extract the ROIs by recognizing markers on images. Cropped ROIs then need to be annotated with contents describing them best. In most cases accurate textual descriptions of the ROIs can be found from figure captions, and these need to be combined with image ROIs for annotation. The annotated ROIs can then be used to, for example, train classifiers that separate ROIs into known categories (medical concepts), or to build visual ontologies, for indexing and retrieval of biomedical articles. We propose an algorithm that pairs visual and textual ROIs that are extracted from images and figure captions, respectively. This algorithm based on dynamic time warping (DTW) clusters recognized pointers into groups, each of which contains pointers with identical visual properties (shape, size, color, etc.). Then a rule-based matching algorithm finds the best matching group for each textual ROI mention. Our method yields a precision and recall of 96% and 79%, respectively, when ground truth textual ROI data is used.

  8. A novel technique for estimation of skew in binary text document ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    Gatos et al (1997) have proposed a new skew detection method based on the information ..... different books, magazines and journals. ..... Duda R O, Hart P E 1973 Pattern classification and scene analysis (New York: Wiley-Interscience).

  9. On-line access to the full-texts of non periodical documents

    International Nuclear Information System (INIS)

    Svrsek, L.

    2004-01-01

    This article describes several options how electronic books (technical handbooks, scientific books, reference works, etc.) are published and available on-line on the Internet. There is a short description of some of the major services provided by worldwide publishers. As a part of the presentation there will be a live demonstration of selected services and work slices of the most interested systems. (author)

  10. On the structure of Bayesian network for Indonesian text document paraphrase identification

    Science.gov (United States)

    Prayogo, Ario Harry; Syahrul Mubarok, Mohamad; Adiwijaya

    2018-03-01

    Paraphrase identification is an important process within natural language processing. The idea is to automatically recognize phrases that have different forms but contain same meanings. For examples if we input query “causing fire hazard”, then the computer has to recognize this query that this query has same meaning as “the cause of fire hazard. Paraphrasing is an activity that reveals the meaning of an expression, writing, or speech using different words or forms, especially to achieve greater clarity. In this research we will focus on classifying two Indonesian sentences whether it is a paraphrase to each other or not. There are four steps in this research, first is preprocessing, second is feature extraction, third is classifier building, and the last is performance evaluation. Preprocessing consists of tokenization, non-alphanumerical removal, and stemming. After preprocessing we will conduct feature extraction in order to build new features from given dataset. There are two kinds of features in the research, syntactic features and semantic features. Syntactic features consist of normalized levenshtein distance feature, term-frequency based cosine similarity feature, and LCS (Longest Common Subsequence) feature. Semantic features consist of Wu and Palmer feature and Shortest Path Feature. We use Bayesian Networks as the method of training the classifier. Parameter estimation that we use is called MAP (Maximum A Posteriori). For structure learning of Bayesian Networks DAG (Directed Acyclic Graph), we use BDeu (Bayesian Dirichlet equivalent uniform) scoring function and for finding DAG with the best BDeu score, we use K2 algorithm. In evaluation step we perform cross-validation. The average result that we get from testing the classifier as follows: Precision 75.2%, Recall 76.5%, F1-Measure 75.8% and Accuracy 75.6%.

  11. The Analysis of Heterogeneous Text Documents with the Help of the Computer Program NUD*IST

    OpenAIRE

    Plaß, Christine; Schetsche, Michael

    2000-01-01

    Anhand eines aktuellen Forschungsprojekts wird der Nutzen des Computerprogramms NUD*IST für die qualitative Dokumentenanalyse erörtert. Unser Projekt untersucht die gesellschaftliche Bewertung von spektakulären Straftaten. Dazu wurden Dokumente aus dem gesamten 20. Jahrhundert erhoben, digitalisiert und unter Einsatz von NUD*IST ausgewertet. Da öffentliche und fachliche Diskurse untersucht werden, ist das Datenmaterial des Projekts außerordentlich heterogen: fachwissenschaftliche Publikatione...

  12. Electronic books. On-line access to the full-texts of non periodical documents

    International Nuclear Information System (INIS)

    Svrsek, L.

    2004-01-01

    This presentation describes several options how electronic books (technical handbooks, scientific books, reference works, etc.) are published and available on-line on the Internet. There is a short description of some of the major services provided by worldwide publishers. As a part of the presentation there will be a live demonstration of selected services and work slices of the most interested systems. (author)

  13. Document management in engineering construction

    International Nuclear Information System (INIS)

    Liao Bing

    2008-01-01

    Document management is one important part of systematic quality management, which is one of the key factors to ensure the construction quality. In the engineering construction, quality management and document management shall interwork all the time, to ensure the construction quality. Quality management ensures that the document is correctly generated and adopted, and thus the completeness, accuracy and systematicness of the document satisfy the filing requirements. Document management ensures that the document is correctly transferred during the construction, and various testimonies such as files and records are kept for the engineering construction and its quality management. This paper addresses the document management in the engineering construction based on the interwork of the quality management and document management. (author)

  14. Recommended HSE-7 documents hierarchy

    International Nuclear Information System (INIS)

    Klein, R.B.; Jennrich, E.A.; Lund, D.M.; Danna, J.G.; Davis, K.D.; Rutz, A.C.

    1990-01-01

    This report recommends a hierarchy of waste management documents at Los Alamos National Laboratory (LANL or ''Laboratory''). The hierarchy addresses documents that are required to plan, implement, and document waste management programs at Los Alamos. These documents will enable the waste management group and the six sections contained within that group to satisfy requirements that are imposed upon them by the US Department of Energy (DOE), DOE Albuquerque Operations, US Environmental Protection Agency, various State of New Mexico agencies, and Laboratory management

  15. Cultural diversity: blind spot in medical curriculum documents, a document analysis.

    Science.gov (United States)

    Paternotte, Emma; Fokkema, Joanne P I; van Loon, Karsten A; van Dulmen, Sandra; Scheele, Fedde

    2014-08-22

    Cultural diversity among patients presents specific challenges to physicians. Therefore, cultural diversity training is needed in medical education. In cases where strategic curriculum documents form the basis of medical training it is expected that the topic of cultural diversity is included in these documents, especially if these have been recently updated. The aim of this study was to assess the current formal status of cultural diversity training in the Netherlands, which is a multi-ethnic country with recently updated medical curriculum documents. In February and March 2013, a document analysis was performed of strategic curriculum documents for undergraduate and postgraduate medical education in the Netherlands. All text phrases that referred to cultural diversity were extracted from these documents. Subsequently, these phrases were sorted into objectives, training methods or evaluation tools to assess how they contributed to adequate curriculum design. Of a total of 52 documents, 33 documents contained phrases with information about cultural diversity training. Cultural diversity aspects were more prominently described in the curriculum documents for undergraduate education than in those for postgraduate education. The most specific information about cultural diversity was found in the blueprint for undergraduate medical education. In the postgraduate curriculum documents, attention to cultural diversity differed among specialties and was mainly superficial. Cultural diversity is an underrepresented topic in the Dutch documents that form the basis for actual medical training, although the documents have been updated recently. Attention to the topic is thus unwarranted. This situation does not fit the demand of a multi-ethnic society for doctors with cultural diversity competences. Multi-ethnic countries should be critical on the content of the bases for their medical educational curricula.

  16. Overview of Historical Earthquake Document Database in Japan and Future Development

    Science.gov (United States)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami

  17. Informational Text and the CCSS

    Science.gov (United States)

    Aspen Institute, 2012

    2012-01-01

    What constitutes an informational text covers a broad swath of different types of texts. Biographies & memoirs, speeches, opinion pieces & argumentative essays, and historical, scientific or technical accounts of a non-narrative nature are all included in what the Common Core State Standards (CCSS) envisions as informational text. Also included…

  18. Cuneiform Documents from Various Dutch Collections

    NARCIS (Netherlands)

    Boer, de R.; Dercksen, J.G.; Krispijn, Th.J.H.; J.G., Dercksen e.a.

    2013-01-01

    Publication of Sumerian and Akkadian cuneiform texts in private collections from various periods: * Presargonic: letter-unknown provenance in Northern Babylonia * Ur III: administrative document-Umma) * Old Assyrian: letters-Kaneš (Anatolia) * Old Babylonian: lexical series Ugumu-unknown

  19. The Only Safe SMS Texting Is No SMS Texting.

    Science.gov (United States)

    Toth, Cheryl; Sacopulos, Michael J

    2015-01-01

    Many physicians and practice staff use short messaging service (SMS) text messaging to communicate with patients. But SMS text messaging is unencrypted, insecure, and does not meet HIPAA requirements. In addition, the short and abbreviated nature of text messages creates opportunities for misinterpretation, and can negatively impact patient safety and care. Until recently, asking patients to sign a statement that they understand and accept these risks--as well as having policies, device encryption, and cyber insurance in place--would have been enough to mitigate the risk of using SMS text in a medical practice. But new trends and policies have made SMS text messaging unsafe under any circumstance. This article explains these trends and policies, as well as why only secure texting or secure messaging should be used for physician-patient communication.

  20. Relevant documents initiating the EDA

    International Nuclear Information System (INIS)

    1993-01-01

    In December 1990, the four ITER Parties successfully concluded the Conceptual Design Activities for ITER. In January, 1991, each of the Parties had decided to enter negotiations on co-operation in the ITER EDA, which are to be conducted under the auspices of the IAEA; and each Party was prepared to receive a letter of invitation from the Director General of the IAEA to participate in those negotiations. Four negotiating meetings were held in 1991, the first being in Vienna, the second in Tokyo, the third in Reston near Washington, and the fourth in Moscow. After completion of the negotiations, each of the Parties proceeded domestically to reach its decision to sign the ITER EDA Agreement and its Protocol 1. All formalities were concluded during the first half of 1992, and the EDA documents were signed in Washington on July 21, 1992. Following the signing, each of the Parties provided the Director General with the names of its two ITER Council members. With the formation of the Council, the EDA had begun. This volume contains the papers developed before the start of the EDA. It begins with the Director General's invitation to participate in the negotiations and ends with the Parties' designations of the ITER Council members. While the evolving text of the Agreement and its Protocol 1 is referred to in some of these papers as an attachment, it is only the final, signed text that is reproduced in this volume

  1. Intelligent Bar Chart Plagiarism Detection in Documents

    Directory of Open Access Journals (Sweden)

    Mohammed Mumtaz Al-Dabbagh

    2014-01-01

    Full Text Available This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR. By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

  2. Predicting Prosody from Text for Text-to-Speech Synthesis

    CERN Document Server

    Rao, K Sreenivasa

    2012-01-01

    Predicting Prosody from Text for Text-to-Speech Synthesis covers the specific aspects of prosody, mainly focusing on how to predict the prosodic information from linguistic text, and then how to exploit the predicted prosodic knowledge for various speech applications. Author K. Sreenivasa Rao discusses proposed methods along with state-of-the-art techniques for the acquisition and incorporation of prosodic knowledge for developing speech systems. Positional, contextual and phonological features are proposed for representing the linguistic and production constraints of the sound units present in the text. This book is intended for graduate students and researchers working in the area of speech processing.

  3. Monitoring interaction and collective text production through text mining

    Directory of Open Access Journals (Sweden)

    Macedo, Alexandra Lorandi

    2014-04-01

    Full Text Available This article presents the Concepts Network tool, developed using text mining technology. The main objective of this tool is to extract and relate terms of greatest incidence from a text and exhibit the results in the form of a graph. The Network was implemented in the Collective Text Editor (CTE which is an online tool that allows the production of texts in synchronized or non-synchronized forms. This article describes the application of the Network both in texts produced collectively and texts produced in a forum. The purpose of the tool is to offer support to the teacher in managing the high volume of data generated in the process of interaction amongst students and in the construction of the text. Specifically, the aim is to facilitate the teacher’s job by allowing him/her to process data in a shorter time than is currently demanded. The results suggest that the Concepts Network can aid the teacher, as it provides indicators of the quality of the text produced. Moreover, messages posted in forums can be analyzed without their content necessarily having to be pre-read.

  4. Methods for Mining and Summarizing Text Conversations

    CERN Document Server

    Carenini, Giuseppe; Murray, Gabriel

    2011-01-01

    Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods

  5. Terminology extraction from medical texts in Polish.

    Science.gov (United States)

    Marciniak, Małgorzata; Mykowiecka, Agnieszka

    2014-01-01

    Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were

  6. Text recycling: acceptable or misconduct?

    Science.gov (United States)

    Harriman, Stephanie; Patel, Jigisha

    2014-08-16

    Text recycling, also referred to as self-plagiarism, is the reproduction of an author's own text from a previous publication in a new publication. Opinions on the acceptability of this practice vary, with some viewing it as acceptable and efficient, and others as misleading and unacceptable. In light of the lack of consensus, journal editors often have difficulty deciding how to act upon the discovery of text recycling. In response to these difficulties, we have created a set of guidelines for journal editors on how to deal with text recycling. In this editorial, we discuss some of the challenges of developing these guidelines, and how authors can avoid undisclosed text recycling.

  7. Document representations for classification of short web-page descriptions

    Directory of Open Access Journals (Sweden)

    Radovanović Miloš

    2008-01-01

    Full Text Available Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .

  8. TEXT DEIXIS IN NARRATIVE SEQUENCES

    Directory of Open Access Journals (Sweden)

    Josep Rivera

    2007-06-01

    Full Text Available This study looks at demonstrative descriptions, regarding them as text-deictic procedures which contribute to weave discourse reference. Text deixis is thought of as a metaphorical referential device which maps the ground of utterance onto the text itself. Demonstrative expressions with textual antecedent-triggers, considered as the most important text-deictic units, are identified in a narrative corpus consisting of J. M. Barrie’s Peter Pan and its translation into Catalan. Some linguistic and discourse variables related to DemNPs are analysed to characterise adequately text deixis. It is shown that this referential device is usually combined with abstract nouns, thus categorising and encapsulating (non-nominal complex discourse entities as nouns, while performing a referential cohesive function by means of the text deixis + general noun type of lexical cohesion.

  9. Documenting a Contested Memory

    DEFF Research Database (Denmark)

    Awad, Sarah H.

    2017-01-01

    This article looks at how symbols in the urban environment are intentionally produced and modified to regulate a community’s collective memory. Our urban environment is filled with symbols in the form of images, text, and structures that embody certain narratives about the past. Once those symbols...... to preserve the memory of the revolution through graffiti murals and the utilization of public space, and from the other, the authority’s efforts to replace those initiatives with its own official narrative. Building on the concept of collective memory, as well as Bartlett’s studies of serial reproductions...

  10. A Hybrid Feature Selection Approach for Arabic Documents Classification

    NARCIS (Netherlands)

    Habib, Mena Badieh; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.; Fayed, Zaki T.; Gharib, Tarek F.

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to

  11. Designing Documents for People to Use

    Directory of Open Access Journals (Sweden)

    David Sless

    Full Text Available This article reports on the work of Communication Research Institute (CRI, an international research center specializing in communication and information design. With the support of government, regulators, industry bodies, and business—and with the participation of people and their advocates—CRI has worked on over 200 public document design projects since it began as a small unit in 1985. CRI investigates practical methods and achievable standards for designing digital and paper public documents, including forms; workplace procedural notices; bills, letters, and emails sent by organizations; labels and instructions that accompany products and services; and legal and financial documents and contracts. CRI has written model grammars for the document types it designs, and the cumulative data from CRI projects has led to a set of systematic methods for designing public-use documents to a high standard. Through research, design, publishing, and advocacy, CRI works to measurably improve the ordinary documents we all have to use. Keywords: Information design, Design methods, Design standards, Communication design, Design diagnostic testing, Design research

  12. Text against Text: Counterbalancing the Hegemony of Assessment.

    Science.gov (United States)

    Cosgrove, Cornelius

    A study examined whether composition specialists can counterbalance the potential privileging of the assessment perspective, or of self-appointed interpreters of that perspective, through the study of assessment discourse as text. Fourteen assessment texts were examined, most of them journal articles and most of them featuring the common…

  13. Knowledge Representation in Travelling Texts

    DEFF Research Database (Denmark)

    Mousten, Birthe; Locmele, Gunta

    2014-01-01

    Today, information travels fast. Texts travel, too. In a corporate context, the question is how to manage which knowledge elements should travel to a new language area or market and in which form? The decision to let knowledge elements travel or not travel highly depends on the limitation...... and the purpose of the text in a new context as well as on predefined parameters for text travel. For texts used in marketing and in technology, the question is whether culture-bound knowledge representation should be domesticated or kept as foreign elements, or should be mirrored or moulded—or should not travel...... at all! When should semantic and pragmatic elements in a text be replaced and by which other elements? The empirical basis of our work is marketing and technical texts in English, which travel into the Latvian and Danish markets, respectively....

  14. Texting while driving: is speech-based text entry less risky than handheld text entry?

    Science.gov (United States)

    He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

    2014-11-01

    Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Active Learning for Text Classification

    OpenAIRE

    Hu, Rong

    2011-01-01

    Text classification approaches are used extensively to solve real-world challenges. The success or failure of text classification systems hangs on the datasets used to train them, without a good dataset it is impossible to build a quality system. This thesis examines the applicability of active learning in text classification for the rapid and economical creation of labelled training data. Four main contributions are made in this thesis. First, we present two novel selection strategies to cho...

  16. Text Mining Applications and Theory

    CERN Document Server

    Berry, Michael W

    2010-01-01

    Text Mining: Applications and Theory presents the state-of-the-art algorithms for text mining from both the academic and industrial perspectives.  The contributors span several countries and scientific domains: universities, industrial corporations, and government laboratories, and demonstrate the use of techniques from machine learning, knowledge discovery, natural language processing and information retrieval to design computational models for automated text analysis and mining. This volume demonstrates how advancements in the fields of applied mathematics, computer science, machine learning

  17. Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH

    Energy Technology Data Exchange (ETDEWEB)

    Bogen, Paul Logasa [ORNL; Symons, Christopher T [ORNL; McKenzie, Amber T [ORNL; Patton, Robert M [ORNL; Gillen, Rob [ORNL

    2013-01-01

    In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing sets of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.

  18. Testing System Encryption-Decryption Method to RSA Security Documents

    International Nuclear Information System (INIS)

    Supriyono

    2008-01-01

    A model of document protection which was tested as one of the instruments, especially text document. The principle of the document protection was how the system was able to protect the document storage and transfer processes. Firstly, the text-formed document was encrypted; therefore, the document cannot be read for the text was transformed into random letters. The letter-randomized text was then unfolded by the description in order that the document owner was able to read it. In the recent research, the method adopted was RSA method, in which it used complicated mathematics calculation and equipped with initial protection key (with either private key or public key), thus, it was more difficult to be attacked by hackers. The system was developed by using the software of Borland Delphi 7. The results indicated that the system was capable to save and transfer the document, both via internet and intranet in the form of encrypted letter and put it back to the initial form of document by way of description. The research also tested for encrypted and decrypted process for various memory size documents. (author)

  19. Aiding the Interpretation of Ancient Documents

    DEFF Research Database (Denmark)

    Roued-Cunliffe, Henriette

    How can Decision Support System (DSS) software aid the interpretation process involved in the reading of ancient documents? This paper discusses the development of a DSS prototype for the reading of ancient texts. In this context the term ‘ancient documents’ is used to describe mainly Greek...... tool it is important first to comprehend the interpretation process involved in reading ancient documents. This is not a linear process but rather a recursive process where the scholar moves between different levels of reading, such as ‘understanding the meaning of a character’ or ‘understanding...

  20. The Digital Administrative Document: an approximate path

    Directory of Open Access Journals (Sweden)

    Francesca Delneri

    2017-09-01

    Full Text Available If the road towards a progressive dematerialization of the administrative document is marked, the legislator is not always proceeding in a coherent, clear or complete way. The line of reasoning needs to be focused on the administration of documental heritage, training and preservation rather than on technological issues, involving actively local administration and taking on responsibilities on decisions, also on the relationship between costs and benefits. The way is hard due to the lack of debate and practical directions, where the preservation should not be considered as a commanding confirmation, but as an occasion to face complex and critical issues.

  1. Technical document characterization by data analysis

    International Nuclear Information System (INIS)

    Mauget, A.

    1993-05-01

    Nuclear power plants possess documents analyzing all the plant systems, which represents a vast quantity of paper. Analysis of textual data can enable a document to be classified by grouping the texts containing the same words. These methods are used on system manuals for feasibility studies. The system manual is then analyzed by LEXTER and the terms it has selected are examined. We first classify according to style (sentences containing general words, technical sentences, etc.), and then according to terms. However, it will not be possible to continue in this fashion for the 100 system manuals existing, because of lack of sufficient storage capacity. Another solution is being developed. (author)

  2. Stamp Detection in Color Document Images

    DEFF Research Database (Denmark)

    Micenkova, Barbora; van Beusekom, Joost

    2011-01-01

    , moreover, it can be imprinted with a variable quality and rotation. Previous methods were restricted to detection of stamps of particular shapes or colors. The method presented in the paper includes segmentation of the image by color clustering and subsequent classification of candidate solutions...... by geometrical and color-related features. The approach allows for differentiation of stamps from other color objects in the document such as logos or texts. For the purpose of evaluation, a data set of 400 document images has been collected, annotated and made public. With the proposed method, recall of 83...

  3. Text-Filled Stacked Area Graphs

    DEFF Research Database (Denmark)

    Kraus, Martin

    2011-01-01

    -filled stacked area graphs; i.e., graphs that feature stacked areas that are filled with small-typed text. Since these graphs allow for computing the text layout automatically, it is possible to include large amounts of textual detail with very little effort. We discuss the most important challenges and some...... solutions for the design of text-filled stacked area graphs with the help of an exemplary visualization of the genres, publication years, and titles of a database of several thousand PC games....

  4. Text and ideology: text-oriented discourse analysis

    Directory of Open Access Journals (Sweden)

    Maria Eduarda Gonçalves Peixoto

    2018-04-01

    Full Text Available The article aims to contribute to the understanding of the connection between text and ideology articulated by the text-oriented analysis of discourse (ADTO. Based on the reflections of Fairclough (1989, 2001, 2003 and Fairclough and Chouliaraki (1999, the debate presents the social ontology that ADTO uses to base its conception of social life as an open system and textually mediated; the article then explains the chronological-narrative development of the main critical theories of ideology, by virtue of which ADTO organizes the assumptions that underpin the particular use it makes of the term. Finally, the discussion presents the main aspects of the connection between text and ideology, offering a conceptual framework that can contribute to the domain of the theme according to a critical discourse analysis approach.

  5. Patent documentation - comparison of two MT strategies

    DEFF Research Database (Denmark)

    Offersgaard, Lene; Povlsen, Claus

    2007-01-01

    This paper focuses on two matters: A comparison of how two different MT strategies manage translating the text type of patent documentation and a survey of what is needed to transform a MT research prototype system to a translation application for patent texts. The two MT strategies is represented....... The distinctive text type of patents pose special demands for machine translation and these aspects are discussed based on linguistic observations with focus on the users point of view. Two main demands are automatic pre processing of the documents and implementation of a module which in a flexible and user......-friendly manner offers the opportunity to extend the lexical coverage of the system. These demands and the comparison of the two MT strategies are discussed on the basis of proofread patents....

  6. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  7. CONAN : Text Mining in the Biomedical Domain

    NARCIS (Netherlands)

    Malik, R.

    2006-01-01

    This thesis is about Text Mining. Extracting important information from literature. In the last years, the number of biomedical articles and journals is growing exponentially. Scientists might not find the information they want because of the large number of publications. Therefore a system was

  8. ERRORS AND DIFFICULTIES IN TRANSLATING LEGAL TEXTS

    Directory of Open Access Journals (Sweden)

    Camelia, CHIRILA

    2014-11-01

    Full Text Available Nowadays the accurate translation of legal texts has become highly important as the mistranslation of a passage in a contract, for example, could lead to lawsuits and loss of money. Consequently, the translation of legal texts to other languages faces many difficulties and only professional translators specialised in legal translation should deal with the translation of legal documents and scholarly writings. The purpose of this paper is to analyze translation from three perspectives: translation quality, errors and difficulties encountered in translating legal texts and consequences of such errors in professional translation. First of all, the paper points out the importance of performing a good and correct translation, which is one of the most important elements to be considered when discussing translation. Furthermore, the paper presents an overview of the errors and difficulties in translating texts and of the consequences of errors in professional translation, with applications to the field of law. The paper is also an approach to the differences between languages (English and Romanian that can hinder comprehension for those who have embarked upon the difficult task of translation. The research method that I have used to achieve the objectives of the paper was the content analysis of various Romanian and foreign authors' works.

  9. Text Genres in Information Organization

    Science.gov (United States)

    Nahotko, Marek

    2016-01-01

    Introduction: Text genres used by so-called information organizers in the processes of information organization in information systems were explored in this research. Method: The research employed text genre socio-functional analysis. Five genre groups in information organization were distinguished. Every genre group used in information…

  10. Strategies for Translating Vocative Texts

    Directory of Open Access Journals (Sweden)

    Olga COJOCARU

    2014-12-01

    Full Text Available The paper deals with the linguistic and cultural elements of vocative texts and the techniques used in translating them by giving some examples of texts that are typically vocative (i.e. advertisements and instructions for use. Semantic and communicative strategies are popular in translation studies and each of them has its own advantages and disadvantages in translating vocative texts. The advantage of semantic translation is that it takes more account of the aesthetic value of the SL text, while communicative translation attempts to render the exact contextual meaning of the original text in such a way that both content and language are readily acceptable and comprehensible to the readership. Focus is laid on the strategies used in translating vocative texts, strategies that highlight and introduce a cultural context to the target audience, in order to achieve their overall purpose, that is to sell or persuade the reader to behave in a certain way. Thus, in order to do that, a number of advertisements from the field of cosmetics industry and electronic gadgets were selected for analysis. The aim is to gather insights into vocative text translation and to create new perspectives on this field of research, now considered a process of innovation and diversion, especially in areas as important as economy and marketing.

  11. Systematic characterizations of text similarity in full text biomedical publications.

    Science.gov (United States)

    Sun, Zhaohui; Errami, Mounir; Long, Tara; Renard, Chris; Choradia, Nishant; Garner, Harold

    2010-09-15

    Computational methods have been used to find duplicate biomedical publications in MEDLINE. Full text articles are becoming increasingly available, yet the similarities among them have not been systematically studied. Here, we quantitatively investigated the full text similarity of biomedical publications in PubMed Central. 72,011 full text articles from PubMed Central (PMC) were parsed to generate three different datasets: full texts, sections, and paragraphs. Text similarity comparisons were performed on these datasets using the text similarity algorithm eTBLAST. We measured the frequency of similar text pairs and compared it among different datasets. We found that high abstract similarity can be used to predict high full text similarity with a specificity of 20.1% (95% CI [17.3%, 23.1%]) and sensitivity of 99.999%. Abstract similarity and full text similarity have a moderate correlation (Pearson correlation coefficient: -0.423) when the similarity ratio is above 0.4. Among pairs of articles in PMC, method sections are found to be the most repetitive (frequency of similar pairs, methods: 0.029, introduction: 0.0076, results: 0.0043). In contrast, among a set of manually verified duplicate articles, results are the most repetitive sections (frequency of similar pairs, results: 0.94, methods: 0.89, introduction: 0.82). Repetition of introduction and methods sections is more likely to be committed by the same authors (odds of a highly similar pair having at least one shared author, introduction: 2.31, methods: 1.83, results: 1.03). There is also significantly more similarity in pairs of review articles than in pairs containing one review and one nonreview paper (frequency of similar pairs: 0.0167 and 0.0023, respectively). While quantifying abstract similarity is an effective approach for finding duplicate citations, a comprehensive full text analysis is necessary to uncover all potential duplicate citations in the scientific literature and is helpful when

  12. EDF Group - 2010 Reference Document

    International Nuclear Information System (INIS)

    2011-04-01

    Beside the accounts of EDF for 2008 and 2009, this voluminous document presents persons in charge, legal account auditors, and how risks are managed within the company. It gives an overview of EDF activities, of its organization, of its assets. It presents and discusses its financial situation and results, indicates the main contracts, and proposes other documents concerning the company. Many documents and reports are provided in appendix

  13. Document understanding for a broad class of documents

    NARCIS (Netherlands)

    Aiello, Marco; Monz, Christof; Todoran, Leon; Worring, Marcel

    2002-01-01

    We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these

  14. Organising Documentation in Knowledge Evolution and Communication

    Directory of Open Access Journals (Sweden)

    Cristina De Castro

    2007-06-01

    Full Text Available The knowledge of a subject evolves in time due to many factors, such as better understanding, study of additional issues within the same subject, study of related work from other themes, etc. This can be achieved by individual work, direct cooperation with other people and, in general, knowledge sharing. In this context, and in the broader context of knowledge communication, the appropriate organisation of documentation plays a fundamental role, but is often very difficult to achieve. A layered architecture is here proposed for the development of a structured repository of documentation, here called knowledge-bibliography KB. The process of knowledge acquisition, evolution and communication is firstly considered, then the distributed nature of nowadays knowledge and the ways it is shared and transferred are taken into account. On the basis of the above considerations, a possible clustering of documentation collected by many people is defined. An LDAP-based architecture for the implementation of this structure is also discussed.

  15. Data mining of text as a tool in authorship attribution

    Science.gov (United States)

    Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu

    2001-03-01

    It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.

  16. Stemming Malay Text and Its Application in Automatic Text Categorization

    Science.gov (United States)

    Yasukawa, Michiko; Lim, Hui Tian; Yokoo, Hidetoshi

    In Malay language, there are no conjugations and declensions and affixes have important grammatical functions. In Malay, the same word may function as a noun, an adjective, an adverb, or, a verb, depending on its position in the sentence. Although extensively simple root words are used in informal conversations, it is essential to use the precise words in formal speech or written texts. In Malay, to make sentences clear, derivative words are used. Derivation is achieved mainly by the use of affixes. There are approximately a hundred possible derivative forms of a root word in written language of the educated Malay. Therefore, the composition of Malay words may be complicated. Although there are several types of stemming algorithms available for text processing in English and some other languages, they cannot be used to overcome the difficulties in Malay word stemming. Stemming is the process of reducing various words to their root forms in order to improve the effectiveness of text processing in information systems. It is essential to avoid both over-stemming and under-stemming errors. We have developed a new Malay stemmer (stemming algorithm) for removing inflectional and derivational affixes. Our stemmer uses a set of affix rules and two types of dictionaries: a root-word dictionary and a derivative-word dictionary. The use of set of rules is aimed at reducing the occurrence of under-stemming errors, while that of the dictionaries is believed to reduce the occurrence of over-stemming errors. We performed an experiment to evaluate the application of our stemmer in text mining software. For the experiment, text data used were actual web pages collected from the World Wide Web to demonstrate the effectiveness of our Malay stemming algorithm. The experimental results showed that our stemmer can effectively increase the precision of the extracted Boolean expressions for text categorization.

  17. Anomaly Detection with Text Mining

    Data.gov (United States)

    National Aeronautics and Space Administration — Many existing complex space systems have a significant amount of historical maintenance and problem data bases that are stored in unstructured text forms. The...

  18. Social Studies: Texts and Supplements.

    Science.gov (United States)

    Curriculum Review, 1979

    1979-01-01

    This review of selected social studies texts, series, and supplements, mainly for the secondary level, includes a special section examining eight titles on warfare and terrorism for grades 4-12. (SJL)

  19. Text Mining in Organizational Research.

    Science.gov (United States)

    Kobayashi, Vladimer B; Mol, Stefan T; Berkers, Hannah A; Kismihók, Gábor; Den Hartog, Deanne N

    2018-07-01

    Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.

  20. Conservation Documentation and the Implications of Digitisation

    Directory of Open Access Journals (Sweden)

    Michelle Moore

    2001-11-01

    Full Text Available Conservation documentation can be defined as the textual and visual records collected during the care and treatment of an object. It can include records of the object's condition, any treatment done to the object, any observations or conclusions made by the conservator as well as details on the object's past and present environment. The form of documentation is not universally agreed upon nor has it always been considered an important aspect of the conservation profession. Good documentation tells the complete story of an object thus far and should provide as much information as possible for the future researcher, curator, or conservator. The conservation profession will benefit from digitising its documentation using software such as databases and hardware like digital cameras and scanners. Digital technology will make conservation documentation more easily accessible, cost/time efficient, and will increase consistency and accuracy of the recorded data, and reduce physical storage space requirements. The major drawback to digitising conservation records is maintaining access to the information for the future; the notorious pace of technological change has serious implications for retrieving data from any machine- readable medium.

  1. A New Binarization Algorithm for Historical Documents

    Directory of Open Access Journals (Sweden)

    Marcos Almeida

    2018-01-01

    Full Text Available Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents a new binarization algorithm for historical documents. The new global filter proposed is performed in four steps: filtering the image using a bilateral filter, splitting image into the RGB components, decision-making for each RGB channel based on an adaptive binarization method inspired by Otsu’s method with a choice of the threshold level, and classification of the binarized images to decide which of the RGB components best preserved the document information in the foreground. The quantitative and qualitative assessment made with 23 binarization algorithms in three sets of “real world” documents showed very good results.

  2. Multi-font printed Mongolian document recognition system

    Science.gov (United States)

    Peng, Liangrui; Liu, Changsong; Ding, Xiaoqing; Wang, Hua; Jin, Jianming

    2009-01-01

    Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.

  3. Using Literary Texts to Teach Grammar in Foreign Language Classroom

    Science.gov (United States)

    Atmaca, Hasan; Günday, Rifat

    2016-01-01

    Today, it is discussed that the use of literary texts in foreign language classroom as a course material isn't obligatory; but necessary due to the close relationship between language and literature. Although literary texts are accepted as authentic documents and do not have any purpose for language teaching, they are indispensable sources to be…

  4. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  5. Investigating the statistical properties of user-generated documents

    OpenAIRE

    Inches, Giacomo; Carman, Mark J.; Crestani, Fabio

    2011-01-01

    The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street...

  6. Investigating the Statistical Properties of User-Generated Documents

    OpenAIRE

    Inches Giacomo; Carman Mark James

    2011-01-01

    The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user generated documents for some of the established services over the Internet (Kongregate Twitter Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal...

  7. Probing the topological properties of complex networks modeling short written texts.

    Directory of Open Access Journals (Sweden)

    Diego R Amancio

    Full Text Available In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well-many informative discoveries have been made this way-but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.

  8. Layout-aware text extraction from full-text PDF of scientific articles

    Directory of Open Access Journals (Sweden)

    Ramakrishnan Cartic

    2012-05-01

    Full Text Available Abstract Background The Portable Document Format (PDF is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1 Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2 Classifying text blocks into rhetorical categories using a rule-based method and (3 Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF

  9. Document Organization Using Kohonen's Algorithm.

    Science.gov (United States)

    Guerrero Bote, Vicente P.; Moya Anegon, Felix de; Herrero Solana, Victor

    2002-01-01

    Discussion of the classification of documents from bibliographic databases focuses on a method of vectorizing reference documents from LISA (Library and Information Science Abstracts) which permits their topological organization using Kohonen's algorithm. Analyzes possibilities of this type of neural network with respect to the development of…

  10. SRS ecology: Environmental information document

    International Nuclear Information System (INIS)

    Wike, L.D.; Shipley, R.W.; Bowers, J.A.

    1993-09-01

    The purpose of this Document is to provide a source of ecological information based on the exiting knowledge gained from research conducted at the Savannah River Site. This document provides a summary and synthesis of ecological research in the three main ecosystem types found at SRS and information on the threatened and endangered species residing there

  11. Magnetic fusion program summary document

    International Nuclear Information System (INIS)

    1979-04-01

    This document outlines the current and planned research, development, and commercialization (RD and C) activities of the Offic of Fusion Energy under the Assistant Secretary for Energy Technology, US Department of Energy (DOE). The purpose of this document is to explain the Office of Fusion Energy's activities to Congress and its committees and to interested members of the public

  12. Documenting the Engineering Design Process

    Science.gov (United States)

    Hollers, Brent

    2017-01-01

    Documentation of ideas and the engineering design process is a critical, daily component of a professional engineer's job. While patent protection is often cited as the primary rationale for documentation, it can also benefit the engineer, the team, company, and stakeholders through creating a more rigorously designed and purposeful solution.…

  13. SRS ecology: Environmental information document

    Energy Technology Data Exchange (ETDEWEB)

    Wike, L.D.; Shipley, R.W.; Bowers, J.A. [and others

    1993-09-01

    The purpose of this Document is to provide a source of ecological information based on the exiting knowledge gained from research conducted at the Savannah River Site. This document provides a summary and synthesis of ecological research in the three main ecosystem types found at SRS and information on the threatened and endangered species residing there.

  14. ITK optical links backup document

    CERN Document Server

    Huffman, B T; The ATLAS collaboration; Flick, T; Ye, J

    2013-01-01

    This document describes the proposed optical links to be used for the ITK in the phase II upgrade. The current R&D for optical links pursued in the Versatile Link group is reviewed. In particular the results demonstrating the radiation tolerance of all the on-detector components are documented. The bandwidth requirements and the resulting numerology are given.

  15. Contextualizing Data Warehouses with Documents

    DEFF Research Database (Denmark)

    Perez, Juan Manuel; Berlanga, Rafael; Aramburu, Maria Jose

    2008-01-01

    warehouse with a document warehouse, resulting in a contextualized warehouse. Thus, the user first selects an analysis context by supplying some keywords. Then, the analysis is performed on a novel type of OLAP cube, called an R-cube, which is materialized by retrieving and ranking the documents...

  16. Inferring Group Processes from Computer-Mediated Affective Text Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University

    2011-02-01

    Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.

  17. Engineering Documentation and Data Control

    Science.gov (United States)

    Matteson, Michael J.; Bramley, Craig; Ciaruffoli, Veronica

    2001-01-01

    Mississippi Space Services (MSS) the facility services contractor for NASA's John C. Stennis Space Center (SSC), is utilizing technology to improve engineering documentation and data control. Two identified improvement areas, labor intensive documentation research and outdated drafting standards, were targeted as top priority. MSS selected AutoManager(R) WorkFlow from Cyco software to manage engineering documentation. The software is currently installed on over 150 desctops. The outdated SSC drafting standard was written for pre-CADD drafting methods, in other words, board drafting. Implementation of COTS software solutions to manage engineering documentation and update the drafting standard resulted in significant increases in productivity by reducing the time spent searching for documents.

  18. Changing Landscapes in Documentation Efforts : Civil Society Documentation of Serious Human Rights Violations

    NARCIS (Netherlands)

    Mc Gonigle, B.N.

    2017-01-01

    Wittingly or unwittingly, civil society actors have long been faced with the task of documenting serious human rights violations. Thirty years ago, such efforts were largely organised by grassroots movements, often with little support or funding from international actors. Sharing information and

  19. Layout-aware text extraction from full-text PDF of scientific articles.

    Science.gov (United States)

    Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc

    2012-05-28

    The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for

  20. A Guide Text or Many Texts? "That is the Question”

    Directory of Open Access Journals (Sweden)

    Delgado de Valencia Sonia

    2001-08-01

    Full Text Available The use of supplementary materials in the classroom has always been an essential part of the teaching and learning process. To restrict our teaching to the scope of one single textbook means to stand behind the advances of knowledge, in any area and context. Young learners appreciate any new and varied support that expands their knowledge of the world: diaries, letters, panels, free texts, magazines, short stories, poems or literary excerpts, and articles taken from Internet are materials that will allow learnersto share more and work more collaboratively. In this article we are going to deal with some of these materials, with the criteria to select, adapt, and create them that may be of interest to the learner and that may promote reading and writing processes. Since no text can entirely satisfy the needs of students and teachers, the creativity of both parties will be necessary to improve the quality of teaching through the adequate use and adaptation of supplementary materials.

  1. Individual Profiling Using Text Analysis

    Science.gov (United States)

    2016-04-15

    AFRL-AFOSR-UK-TR-2016-0011 Individual Profiling using Text Analysis 140333 Mark Stevenson UNIVERSITY OF SHEFFIELD, DEPARTMENT OF PSYCHOLOGY Final...REPORT TYPE      Final 3.  DATES COVERED (From - To)      15 Sep 2014 to 14 Sep 2015 4.  TITLE AND SUBTITLE Individual Profiling using Text Analysis ...consisted of collections of tweets for a number of Twitter users whose gender, age and personality scores are known. The task was to construct some system

  2. Finding text in color images

    Science.gov (United States)

    Zhou, Jiangying; Lopresti, Daniel P.; Tasdizen, Tolga

    1998-04-01

    In this paper, we consider the problem of locating and extracting text from WWW images. A previous algorithm based on color clustering and connected components analysis works well as long as the color of each character is relatively uniform and the typography is fairly simple. It breaks down quickly, however, when these assumptions are violated. In this paper, we describe more robust techniques for dealing with this challenging problem. We present an improved color clustering algorithm that measures similarity based on both RGB and spatial proximity. Layout analysis is also incorporated to handle more complex typography. THese changes significantly enhance the performance of our text detection procedure.

  3. Documentation of TRU biological transport model (BIOTRAN)

    Energy Technology Data Exchange (ETDEWEB)

    Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

    1980-01-01

    Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text.

  4. Documentation of TRU biological transport model (BIOTRAN)

    International Nuclear Information System (INIS)

    Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

    1980-01-01

    Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text

  5. Multimodal document management in radiotherapy

    International Nuclear Information System (INIS)

    Fahrner, H.; Kirrmann, S.; Roehner, F.; Schmucker, M.; Hall, M.; Heinemann, F.

    2013-01-01

    Background and purpose: After incorporating treatment planning and the organisational model of treatment planning in the operating schedule system (BAS, 'Betriebsablaufsystem'), complete document qualities were embedded in the digital environment. The aim of this project was to integrate all documents independent of their source (paper-bound or digital) and to make content from the BAS available in a structured manner. As many workflow steps as possible should be automated, e.g. assigning a document to a patient in the BAS. Additionally it must be guaranteed that at all times it could be traced who, when, how and from which source documents were imported into the departmental system. Furthermore work procedures should be changed that the documentation conducted either directly in the departmental system or from external systems can be incorporated digitally and paper document can be completely avoided (e.g. documents such as treatment certificate, treatment plans or documentation). It was a further aim, if possible, to automate the removal of paper documents from the departmental work flow, or even to make such paper documents superfluous. In this way patient letters for follow-up appointments should automatically generated from the BAS. Similarly patient record extracts in the form of PDF files should be enabled, e.g. for controlling purposes. Method: The available document qualities were analysed in detail by a multidisciplinary working group (BAS-AG) and after this examination and assessment of the possibility of modelling in our departmental workflow (BAS) they were transcribed into a flow diagram. The gathered specifications were implemented in a test environment by the clinical and administrative IT group of the department of radiation oncology and subsequent to a detailed analysis introduced into clinical routine. Results: The department has succeeded under the conditions of the aforementioned criteria to embed all relevant documents in the departmental

  6. Automated analysis of instructional text

    Energy Technology Data Exchange (ETDEWEB)

    Norton, L.M.

    1983-05-01

    The development of a capability for automated processing of natural language text is a long-range goal of artificial intelligence. This paper discusses an investigation into the issues involved in the comprehension of descriptive, as opposed to illustrative, textual material. The comprehension process is viewed as the conversion of knowledge from one representation into another. The proposed target representation consists of statements of the prolog language, which can be interpreted both declaratively and procedurally, much like production rules. A computer program has been written to model in detail some ideas about this process. The program successfully analyzes several heavily edited paragraphs adapted from an elementary textbook on programming, automatically synthesizing as a result of the analysis a working Prolog program which, when executed, can parse and interpret let commands in the basic language. The paper discusses the motivations and philosophy of the project, the many kinds of prerequisite knowledge which are necessary, and the structure of the text analysis program. A sentence-by-sentence account of the analysis of the sample text is presented, describing the syntactic and semantic processing which is involved. The paper closes with a discussion of lessons learned from the project, possible alternative approaches, and possible extensions for future work. The entire project is presented as illustrative of the nature and complexity of the text analysis process, rather than as providing definitive or optimal solutions to any aspects of the task. 12 references.

  7. Solar Concepts: A Background Text.

    Science.gov (United States)

    Gorham, Jonathan W.

    This text is designed to provide teachers, students, and the general public with an overview of key solar energy concepts. Various energy terms are defined and explained. Basic thermodynamic laws are discussed. Alternative energy production is described in the context of the present energy situation. Described are the principal contemporary solar…

  8. Quality Inspection of Printed Texts

    DEFF Research Database (Denmark)

    Pedersen, Jesper Ballisager; Nasrollahi, Kamal; Moeslund, Thomas B.

    2016-01-01

    -folded: for costumers of the printing and verification system, the overall grade used to verify if the text is of sufficient quality, while for printer's manufacturer, the detailed character/symbols grades and quality measurements are used for the improvement and optimization of the printing task. The proposed system...

  9. Shoulder dystocia documentation: an evaluation of a documentation training intervention.

    Science.gov (United States)

    LeRiche, Tammy; Oppenheimer, Lawrence; Caughey, Sharon; Fell, Deshayne; Walker, Mark

    2015-03-01

    To evaluate the quality and content of nurse and physician shoulder dystocia delivery documentation before and after MORE training in shoulder dystocia management skills and documentation. Approximately 384 charts at the Ottawa Hospital General Campus involving a diagnosis of shoulder dystocia between the years of 2000 and 2006 excluding the training year of 2003 were identified. The charts were evaluated for 14 key components derived from a validated instrument. The delivery notes were then scored based on these components by 2 separate investigators who were blinded to delivery note author, date, and patient identification to further quantify delivery record quality. Approximately 346 charts were reviewed for physician and nurse delivery documentation. The average score for physician notes was 6 (maximum possible score of 14) both before and after the training intervention. The nurses' average score was 5 before and after the training intervention. Negligible improvement was observed in the content and quality of shoulder dystocia documentation before and after nurse and physician training.

  10. Intelligent bar chart plagiarism detection in documents.

    Science.gov (United States)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Rehman, Amjad; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

    2014-01-01

    This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

  11. Intelligent Bar Chart Plagiarism Detection in Documents

    Science.gov (United States)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

    2014-01-01

    This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts. PMID:25309952

  12. Evaluation of Hierarchical Clustering Algorithms for Document Datasets

    National Research Council Canada - National Science Library

    Zhao, Ying; Karypis, George

    2002-01-01

    Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters...

  13. A Digital Humanities Approach to the History of Science Eugenics Revisited in Hidden Debates by Means of Semantic Text Mining

    OpenAIRE

    Huijnen, Pim; Laan, Fons; de Rijke, Maarten; Pieters, Toine

    2014-01-01

    Comparative historical research on the the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two succ...

  14. Development of digital library system on regulatory documents for nuclear power plants

    International Nuclear Information System (INIS)

    Lee, K. H.; Kim, K. J.; Yoon, Y. H.; Kim, M. W.; Lee, J. I.

    2001-01-01

    The main objective of this study is to establish nuclear regulatory document retrieval system based on internet. With the advancement of internet and information processing technology, information management patterns are going through a new paradigm. Getting along the current of the time, it is general tendency to transfer paper-type documents into electronic-type documents through document scanning and indexing. This system consists of nuclear regulatory documents, nuclear safety documents, digital library, and information system with index and full text

  15. Documents

    International Development Research Centre (IDRC) Digital Library (Canada)

    livelihoods in spite of chronic water shortages. Farmers' Association: By the People for the People. Supported by WaDImena, a team from the Desert. Development Centre (DDC) at the American University in. Cairo helped farmers to found their first association to improve agricultural water management in Abu Minqar.

  16. Resource Lean and Portable Automatic Text Summarization

    OpenAIRE

    Hassel, Martin

    2007-01-01

    Today, with digitally stored information available in abundance, even for many minor languages, this information must by some means be filtered and extracted in order to avoid drowning in it. Automatic summarization is one such technique, where a computer summarizes a longer text to a shorter non-rendundant form. Apart from the major languages of the world there are a lot of languages for which large bodies of data aimed at language technology research to a high degree are lacking. There migh...

  17. Utilizing Multi-Field Text Features for Efficient Email Spam Filtering

    Directory of Open Access Journals (Sweden)

    Wuying Liu

    2012-06-01

    Full Text Available Large-scale spam emails cause a serious waste of time and resources. This paper investigates the text features of email documents and the feature noises among multi-field texts, resulting in an observation of a power law distribution of feature strings within each text field. According to the observation, we propose an efficient filtering approach including a compound weight method and a lightweight field text classification algorithm. The compound weight method considers both the historical classifying ability of each field classifier and the classifying contribution of each text field in the current classified email. The lightweight field text classification algorithm straightforwardly calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a string-frequency index for labeled emails storing. The string-frequency index structure has a random-sampling-based compressible property owing to the power law distribution and can largely reduce the storage space. The experimental results in the TREC spam track show that the proposed approach can complete the filtering task in low space cost and high speed, whose overall performance 1-ROCA exceeds the best one among the participators at the trec07p evaluation.

  18. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...... the accuracy at the same time. The test example is classified using simpler and smaller model. The training examples in a particular cluster share the common vocabulary. At the time of clustering, we do not take into account the labels of the training examples. After the clusters have been created......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...

  19. Text as an Autopoietic System

    DEFF Research Database (Denmark)

    Nicolaisen, Maria Skou

    2016-01-01

    The aim of the present research article is to discuss the possibilities and limitations in addressing text as an autopoietic system. The theory of autopoiesis originated in the field of biology in order to explain the dynamic processes entailed in sustaining living organisms at cellular level. Th....... By comparing the biological with the textual account of autopoietic agency, the end conclusion is that a newly derived concept of sociopoiesis might be better suited for discussing the architecture of textual systems....

  20. Reasoning with Annotations of Texts

    OpenAIRE

    Ma , Yue; Lévy , François; Ghimire , Sudeep

    2011-01-01

    International audience; Linguistic and semantic annotations are important features for text-based applications. However, achieving and maintaining a good quality of a set of annotations is known to be a complex task. Many ad hoc approaches have been developed to produce various types of annotations, while comparing those annotations to improve their quality is still rare. In this paper, we propose a framework in which both linguistic and domain information can cooperate to reason with annotat...

  1. Documentation of torture victims, assessment of the start procedure for medico-legal documentation.

    Science.gov (United States)

    Mandel, Lene; Worm, Lise

    2007-01-01

    A Pilot Study was performed at the Rehabilitation and Research Centre for Torture Victims (RCT) in Copenhagen in order to explore the possibilities for adding a medico-legal documentation component to the rehabilitation of torture victims already taking place. It describes the process and results on implementing medico-legal documentation in a rehabilitative setting. A modified version of the Guidelines in the Istanbul Protocol was developed on the basis of the review of literature and current practices described in "Documentation of torture victims, implementation of medico-legal protocols". The modified guidelines were tested on five clients. The aim was twofold: 1) To assess the client's attitude towards the idea of adding a documentation component to the rehabilitation process and: 2) To assess the practical circumstances of implementing the Istanbul Protocol in the everyday life of a rehabilitation centre. Results show that all five clients were positive towards the project and found comfort in being able to contribute to the fight against impunity. Also, the Pilot Study demonstrated that a large part of the medico-legal documentation was already obtained in the rehabilitation process. It was however not accessible due to lack of systematization and a data registering system. There are thus important synergies in collecting data for rehabilitation and documentation but a joint database system is necessary to realize these synergies.

  2. Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

    Directory of Open Access Journals (Sweden)

    M.C. Padma

    2008-06-01

    Full Text Available In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.

  3. Document Management in Local Government.

    Science.gov (United States)

    Williams, Bernard J. S.

    1998-01-01

    The latest in electronic document management in British local government is discussed. Finance, revenues, and benefits systems of leading vendors to local authorities are highlighted. A planning decisions archive management system and other information services are discussed. (AEF)

  4. Virtual Library Design Document; TOPICAL

    International Nuclear Information System (INIS)

    M. A. deLamare

    2001-01-01

    The objective of this document is to establish a design for the virtual library user and administrative layers that complies with the requirements of the virtual library software specification and subordinate module specification

  5. Reactor operation environmental information document

    Energy Technology Data Exchange (ETDEWEB)

    Bauer, L.R.; Hayes, D.W.; Hunter, C.H.; Marter, W.L.; Moyer, R.A.

    1989-12-01

    This volume is a reactor operation environmental information document for the Savannah River Plant. Topics include meteorology, surface hydrology, transport, environmental impacts, and radiation effects. 48 figs., 56 tabs. (KD)

  6. ENDF/B summary documentation

    International Nuclear Information System (INIS)

    Kinsey, R.

    1979-07-01

    This publication provides a localized source of descriptions for the evaluations contained in the ENDF/B Library. The summary documentation presented is intended to be a more detailed description than the (File 1) comments contained in the computer readable data files, but not so detailed as the formal reports describing each ENDF/B evaluation. The summary documentations were written by the CSEWB (Cross Section Evaluation Working Group) evaluators and compiled by NNDC (National Nuclear Data Center). This edition includes documentation for materials found on ENDF/B Version V tapes 501 to 516 (General Purpose File) excluding tape 504. ENDF/B-V also includes tapes containing partial evaluations for the Special Purpose Actinide (521, 522), Dosimetry (531), Activation (532), Gas Production (533), and Fission Product (541-546) files. The materials found on these tapes are documented elsewhere. Some of the evaluation descriptions in this report contain cross sections or energy level information

  7. CERN Document Server (CDS): Introduction

    CERN Multimedia

    CERN. Geneva; Costa, Flavio

    2017-01-01

    A short online tutorial introducing the CERN Document Server (CDS). Basic functionality description, the notion of Revisions and the CDS test environment. Links: CDS Production environment CDS Test environment  

  8. ENDF/B summary documentation

    Energy Technology Data Exchange (ETDEWEB)

    Kinsey, R. (comp.)

    1979-07-01

    This publication provides a localized source of descriptions for the evaluations contained in the ENDF/B Library. The summary documentation presented is intended to be a more detailed description than the (File 1) comments contained in the computer readable data files, but not so detailed as the formal reports describing each ENDF/B evaluation. The summary documentations were written by the CSEWB (Cross Section Evaluation Working Group) evaluators and compiled by NNDC (National Nuclear Data Center). This edition includes documentation for materials found on ENDF/B Version V tapes 501 to 516 (General Purpose File) excluding tape 504. ENDF/B-V also includes tapes containing partial evaluations for the Special Purpose Actinide (521, 522), Dosimetry (531), Activation (532), Gas Production (533), and Fission Product (541-546) files. The materials found on these tapes are documented elsewhere. Some of the evaluation descriptions in this report contain cross sections or energy level information. (RWR)

  9. Document development, management and transmission

    International Nuclear Information System (INIS)

    Meister, K.

    1998-01-01

    Environmental monitoring can only be carried out by means of cartographic outputs allowing the representation of information in a compressed way and with a local reference. On account of this requirement and the continuously growing importance of international data exchange the development of a universal tool for the combination of data to so-called documents has been started for the management and for the exchange of these documents with other systems. (R.P.)

  10. ENDF/B summary documentation

    International Nuclear Information System (INIS)

    Garber, D.

    1975-10-01

    Descriptions of the evaluations contained in the ENDF/B library are given. The summary documentation presented is intended to be a more detailed description than the (File 1) comments contained in the computer-readable data files, but not so detailed as the formal reports describing each ENDF/B evaluation. The documentations were written by the CSEWG evaluators and compiled by NNCSC. Selected materials which comprise this volume include from 1 H to 244 Cm

  11. Discrepancies in Communication Versus Documentation of Weight-Management Benchmarks

    Directory of Open Access Journals (Sweden)

    Christy B. Turer MD, MHS

    2017-02-01

    Full Text Available To examine gaps in communication versus documentation of weight-management clinical practices, communication was recorded during primary care visits with 6- to 12-year-old overweight/obese Latino children. Communication/documentation content was coded by 3 reviewers using communication transcripts and health-record documentation. Discrepancies in communication/documentation content codes were resolved through consensus. Bivariate/multivariable analyses examined factors associated with discrepancies in benchmark communication/documentation. Benchmarks were neither communicated nor documented in up to 42% of visits, and communicated but not documented or documented but not communicated in up to 20% of visits. Lowest benchmark performance rates were for laboratory studies (35% and nutrition/weight-management referrals (42%. In multivariable analysis, overweight (vs obesity was associated with 1.6 more discrepancies in communication versus documentation (P = .03. Many weight-management benchmarks are not met, not documented, or performed without being communicated. Enhanced communication with families and documentation in health records may promote lifestyle changes in overweight children and higher quality care for overweight children in primary care.

  12. INTEGRATION OF COMPUTER TECHNOLOGIES SMK: AUTOMATION OF THE PRODUCTION CERTIFICA-TION PROCEDURE AND FORMING OF SHIPPING DOCUMENTS

    Directory of Open Access Journals (Sweden)

    S. A. Pavlenko

    2009-01-01

    Full Text Available Integration of informational computer technologies allowed to reorganize and optimize some processes due to decrease of circulation of documents, unification of documentation forms and others.

  13. Patterns for Effectively Documenting Frameworks

    Science.gov (United States)

    Aguiar, Ademar; David, Gabriel

    Good design and implementation are necessary but not sufficient pre-requisites for successfully reusing object-oriented frameworks. Although not always recognized, good documentation is crucial for effective framework reuse, and often hard, costly, and tiresome, coming with many issues, especially when we are not aware of the key problems and respective ways of addressing them. Based on existing literature, case studies and lessons learned, the authors have been mining proven solutions to recurrent problems of documenting object-oriented frameworks, and writing them in pattern form, as patterns are a very effective way of communicating expertise and best practices. This paper presents a small set of patterns addressing problems related to the framework documentation itself, here seen as an autonomous and tangible product independent of the process used to create it. The patterns aim at helping non-experts on cost-effectively documenting object-oriented frameworks. In concrete, these patterns provide guidance on choosing the kinds of documents to produce, how to relate them, and which contents to include. Although the focus is more on the documents themselves, rather than on the process and tools to produce them, some guidelines are also presented in the paper to help on applying the patterns to a specific framework.

  14. Text Mining for Protein Docking.

    Directory of Open Access Journals (Sweden)

    Varsha D Badal

    2015-12-01

    Full Text Available The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking. Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu. The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound

  15. The Balinese Unicode Text Processing

    Directory of Open Access Journals (Sweden)

    Imam Habibi

    2009-06-01

    Full Text Available In principal, the computer only recognizes numbers as the representation of a character. Therefore, there are many encoding systems to allocate these numbers although not all characters are covered. In Europe, every single language even needs more than one encoding system. Hence, a new encoding system known as Unicode has been established to overcome this problem. Unicode provides unique id for each different characters which does not depend on platform, program, and language. Unicode standard has been applied in a number of industries, such as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, and Unisys. In addition, language standards and modern information exchanges such as XML, Java, ECMA Script (JavaScript, LDAP, CORBA 3.0, and WML make use of Unicode as an official tool for implementing ISO/IEC 10646. There are four things to do according to Balinese script: the algorithm of transliteration, searching, sorting, and word boundary analysis (spell checking. To verify the truth of algorithm, some applications are made. These applications can run on Linux/Windows OS platform using J2SDK 1.5 and J2ME WTK2 library. The input and output of the algorithm/application are character sequence that is obtained from keyboard punch and external file. This research produces a module or a library which is able to process the Balinese text based on Unicode standard. The output of this research is the ability, skill, and mastering of 1. Unicode standard (21-bit as a substitution to ASCII (7-bit and ISO8859-1 (8-bit as the former default character set in many applications. 2. The Balinese Unicode text processing algorithm. 3. An experience of working with and learning from an international team that consists of the foremost experts in the area: Michael Everson (Ireland, Peter Constable (Microsoft US, I Made Suatjana, and Ida Bagus Adi Sudewa.

  16. Text Entry by Gazing and Smiling

    Directory of Open Access Journals (Sweden)

    Outi Tuisku

    2013-01-01

    Full Text Available Face Interface is a wearable prototype that combines the use of voluntary gaze direction and facial activations, for pointing and selecting objects on a computer screen, respectively. The aim was to investigate the functionality of the prototype for entering text. First, three on-screen keyboard layout designs were developed and tested (n=10 to find a layout that would be more suitable for text entry with the prototype than traditional QWERTY layout. The task was to enter one word ten times with each of the layouts by pointing letters with gaze and select them by smiling. Subjective ratings showed that a layout with large keys on the edge and small keys near the center of the keyboard was rated as the most enjoyable, clearest, and most functional. Second, using this layout, the aim of the second experiment (n=12 was to compare entering text with Face Interface to entering text with mouse. The results showed that text entry rate for Face Interface was 20 characters per minute (cpm and 27 cpm for the mouse. For Face Interface, keystrokes per character (KSPC value was 1.1 and minimum string distance (MSD error rate was 0.12. These values compare especially well with other similar techniques.

  17. Biased limiter experiments on text

    International Nuclear Information System (INIS)

    Phillips, P.E.; Wootton, A.J.; Rowan, W.L.; Ritz, C.P.; Rhodes, T.L.; Bengtson, R.D.; Hodge, W.L.; Durst, R.D.; McCool, S.C.; Richards, B.; Gentle, K.W.; Schoch, P.; Forster, J.C.; Hickok, R.L.; Evans, T.E.

    1987-01-01

    Experiments using an electrically biased limiter have been performed on the Texas Experimental Tokamak (TEXT). A small movable limiter is inserted past the main poloidal ring limiter (which is electrically connected to the vacuum vessel) and biased at V Lim with respect to it. The floating potential, plasma potential and shear layer position can be controlled. With vertical strokeV Lim vertical stroke ≥ 50 V the plasma density increases. For V Lim Lim > 0 the results obtained are inconclusive. Variation of V Lim changes the electrostatic turbulence which may explain the observed total flux changes. (orig.)

  18. New Historicism: Text and Context

    Directory of Open Access Journals (Sweden)

    Violeta M. Vesić

    2016-02-01

    Full Text Available During most of the twentieth century history was seen as a phenomenon outside of literature that guaranteed the veracity of literary interpretation. History was unique and it functioned as a basis for reading literary works. During the seventies of the twentieth century there occurred a change of attitude towards history in American literary theory, and there appeared a new theoretical approach which soon became known as New Historicism. Since its inception, New Historicism has been identified with the study of Renaissance and Romanticism, but nowadays it has been increasingly involved in other literary trends. Although there are great differences in the arguments and practices at various representatives of this school, New Historicism has clearly recognizable features and many new historicists will agree with the statement of Walter Cohen that New Historicism, when it appeared in the eighties, represented something quite new in reference to the studies of theory, criticism and history (Cohen 1987, 33. Theoretical connection with Bakhtin, Foucault and Marx is clear, as well as a kind of uneasy tie with deconstruction and the work of Paul de Man. At the center of this approach is a renewed interest in the study of literary works in the light of historical and political circumstances in which they were created. Foucault encouraged readers to begin to move literary texts and to link them with discourses and representations that are not literary, as well as to examine the sociological aspects of the texts in order to take part in the social struggles of today. The study of literary works using New Historicism is the study of politics, history, culture and circumstances in which these works were created. With regard to one of the main fact which is located in the center of the criticism, that history cannot be viewed objectively and that reality can only be understood through a cultural context that reveals the work, re-reading and interpretation of

  19. A database for TMT interface control documents

    Science.gov (United States)

    Gillies, Kim; Roberts, Scott; Brighton, Allan; Rogers, John

    2016-08-01

    The TMT Software System consists of software components that interact with one another through a software infrastructure called TMT Common Software (CSW). CSW consists of software services and library code that is used by developers to create the subsystems and components that participate in the software system. CSW also defines the types of components that can be constructed and their roles. The use of common component types and shared middleware services allows standardized software interfaces for the components. A software system called the TMT Interface Database System was constructed to support the documentation of the interfaces for components based on CSW. The programmer describes a subsystem and each of its components using JSON-style text files. A command interface file describes each command a component can receive and any commands a component sends. The event interface files describe status, alarms, and events a component publishes and status and events subscribed to by a component. A web application was created to provide a user interface for the required features. Files are ingested into the software system's database. The user interface allows browsing subsystem interfaces, publishing versions of subsystem interfaces, and constructing and publishing interface control documents that consist of the intersection of two subsystem interfaces. All published subsystem interfaces and interface control documents are versioned for configuration control and follow the standard TMT change control processes. Subsystem interfaces and interface control documents can be visualized in the browser or exported as PDF files.

  20. Imaginary Documentary: reflecting upon contemporary documental photography Documentário Imaginário: reflexões sobre a fotografia documental contemporânea

    Directory of Open Access Journals (Sweden)

    Kátia Hallak Lombardi

    2008-01-01

    Full Text Available This article pursues the idea of an Imaginary Documentary – a possible new inflexion on the practices of contemporary documental photography. The text establishes its theoretical foundations through a forthcoming approach of the discussions about documental photography to the concept of imaginary, by Gilbert Durand, and the notion of Imaginary Museum, by André Malraux. Photographers that are part of documental photography history are the elected objects in which we shall confront the potentialities of the Imaginary Documentary. Este artigo tem como propósito buscar a estruturação da idéia de Documentário Imaginário – uma possível inflexão na prática da fotografia documental contemporânea. O texto assenta suas bases teóricas por meio da aproximação de reflexões sobre a fotografia documental ao conceito de imaginário em Gilbert Durand e à noção de Museu Imaginário de André Malraux. Fotógrafos que fazem parte da história da fotografia documental são os objetos eleitos para aferir as potencialidades do Documentário Imaginário.

  1. Guidelines for the documentation of digital computer programs - approved 1974

    International Nuclear Information System (INIS)

    Anon.

    1975-01-01

    This standard presents guidelines for the documentation of engineering and scientific computer programs. Good documentation promotes understanding, reduces duplication of effort, eases conversion to different computer environments and aids modification for extended applications. Good documentation is essential for implementation and effective use of programs obtained from other installations. Since the intention of this standard is to encourage better communication between the developer and user, it should be regarded as a guide rather than a set of rigid specifications. As a guide, it is sufficiently comprehensive to apply to large-scale programs intended for extensive external use. Not all features of this document are appropriate in all circumstances. In general, as the project complexity increases so does the need for more complete documentation. An organization may have special documentation requirements which supersede or extend these guidelines. This standard is a revision of ANS-STD.2-1967 and supersedes it

  2. Invisible in Thailand: documenting the need for protection

    Directory of Open Access Journals (Sweden)

    Margaret Green

    2008-04-01

    Full Text Available The International Rescue Committee (IRC has conducted asurvey to document the experiences of Burmese people livingin border areas of Thailand and assess the degree to whichthey merit international protection as refugees.

  3. ARCHITECTURE SOFTWARE SOLUTION TO SUPPORT AND DOCUMENT MANAGEMENT QUALITY SYSTEM

    Directory of Open Access Journals (Sweden)

    Milan Eric

    2010-12-01

    Full Text Available One of the basis of a series of standards JUS ISO 9000 is quality system documentation. An architecture of the quality system documentation depends on the complexity of business system. An establishment of an efficient management documentation of system of quality is of a great importance for the business system, as well as in the phase of introducing the quality system and in further stages of its improvement. The study describes the architecture and capability of software solutions to support and manage the quality system documentation in accordance with the requirements of standards ISO 9001:2001, ISO 14001:2005 HACCP etc.

  4. Sharing and Adaptation of Educational Documents in E-Learning

    Directory of Open Access Journals (Sweden)

    Chekry Abderrahman

    2012-03-01

    Full Text Available Few documents can be reused among the huge number of the educational documents on the web. The exponential increase of these documents makes it almost impossible to search for relevant documents. In addition to this, e-learning is designed for public users who have different levels of knowledge and varied skills so they should be given a content that sees to their needs. This work is about adapting the content of learning with learners preferences, and give the teachers the ability to reuse a given content.

  5. GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

    Science.gov (United States)

    Srinivasa, K. G.; Shree Devi, B. N.

    2017-10-01

    String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU's for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document's length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA's GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.

  6. Basic freight forwarding and transport
 documentation in freight forwarder’s work

    Directory of Open Access Journals (Sweden)

    Adam Salomon

    2014-09-01

    Full Text Available The purpose of the article is to present the basic documentation in international freight forwarder’s work, in particular, insurance documents and transport documents in various modes of transport. An additional goal is to identify sources of the paper, which can be used to properly completing the individual documents.

  7. Large hydropower generating units

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-07-01

    This document presents the Brazilian experience with the design, fabrication, construction, commissioning and operation of large scale and generation capacity unities. The experience had been acquired with the implementation of Itumbiara, Paulo Afonso IV, Tucurui, Itaipu and Xingo power plants, which are among the largest world unities.

  8. On the use of the singular value decomposition for text retrieval

    Energy Technology Data Exchange (ETDEWEB)

    Husbands, P.; Simon, H.D.; Ding, C.

    2000-12-04

    The use of the Singular Value Decomposition (SVD) has been proposed for text retrieval in several recent works. This technique uses the SVD to project very high dimensional document and query vectors into a low dimensional space. In this new space it is hoped that the underlying structure of the collection is revealed thus enhancing retrieval performance. Theoretical results have provided some evidence for this claim and to some extent experiments have confirmed this. However, these studies have mostly used small test collections and simplified document models. In this work we investigate the use of the SVD on large document collections. We show that, if interpreted as a mechanism for representing the terms of the collection, this technique alone is insufficient for dealing with the variability in term occurrence. Section 2 introduces the text retrieval concepts necessary for our work. A short description of our experimental architecture is presented in Section 3. Section 4 describes how term occurrence variability affects the SVD and then shows how the decomposition influences retrieval performance. A possible way of improving SVD-based techniques is presented in Section 5 and concluded in Section 6.

  9. A method for extracting design rationale knowledge based on Text Mining

    Directory of Open Access Journals (Sweden)

    Liu Jihong

    2017-01-01

    Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.

  10. Using Electronic Systems for Document Management in Economic Entities

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available Document workflow and management, be them scanned documents, computer-generated e-documents or complex file formats, are critical elements for the success of an organization. Delivering the correct information to the right person, at the right moment is a fundamental element of daily activity. In the Internet era, documents have a new format; and what is more important: completely new functions. Paper is replaced by electronic formats such as .html, .xms, .pdf or .doc. The price for this progress is the increasing technological complexity, and with this complexity comes the need for more efficient techniques of management and organization such as a document management electronic system. This paper aims to present document management not as a separate software category on the IT market, but as an element integrated with any software solution, maximizing its capacity of making business more efficient.

  11. Eigenvector space model to capture features of documents

    Directory of Open Access Journals (Sweden)

    Choi DONGJIN

    2011-09-01

    Full Text Available Eigenvectors are a special set of vectors associated with a linear system of equations. Because of the special property of eigenvector, it has been used a lot for computer vision area. When the eigenvector is applied to information retrieval field, it is possible to obtain properties of documents data corpus. To capture properties of given documents, this paper conducted simple experiments to prove the eigenvector is also possible to use in document analysis. For the experiment, we use short abstract document of Wikipedia provided by DBpedia as a document corpus. To build an original square matrix, the most popular method named tf-idf measurement will be used. After calculating the eigenvectors of original matrix, each vector will be plotted into 3D graph to find what the eigenvector means in document processing.

  12. DOCUMENT IMAGE REGISTRATION FOR IMPOSED LAYER EXTRACTION

    Directory of Open Access Journals (Sweden)

    Surabhi Narayan

    2017-02-01

    Full Text Available Extraction of filled-in information from document images in the presence of template poses challenges due to geometrical distortion. Filled-in document image consists of null background, general information foreground and vital information imposed layer. Template document image consists of null background and general information foreground layer. In this paper a novel document image registration technique has been proposed to extract imposed layer from input document image. A convex polygon is constructed around the content of the input and the template image using convex hull. The vertices of the convex polygons of input and template are paired based on minimum Euclidean distance. Each vertex of the input convex polygon is subjected to transformation for the permutable combinations of rotation and scaling. Translation is handled by tight crop. For every transformation of the input vertices, Minimum Hausdorff distance (MHD is computed. Minimum Hausdorff distance identifies the rotation and scaling values by which the input image should be transformed to align it to the template. Since transformation is an estimation process, the components in the input image do not overlay exactly on the components in the template, therefore connected component technique is applied to extract contour boxes at word level to identify partially overlapping components. Geometrical features such as density, area and degree of overlapping are extracted and compared between partially overlapping components to identify and eliminate components common to input image and template image. The residue constitutes imposed layer. Experimental results indicate the efficacy of the proposed model with computational complexity. Experiment has been conducted on variety of filled-in forms, applications and bank cheques. Data sets have been generated as test sets for comparative analysis.

  13. Document Examination: Applications of Image Processing Systems.

    Science.gov (United States)

    Kopainsky, B

    1989-12-01

    Dealing with images is a familiar business for an expert in questioned documents: microscopic, photographic, infrared, and other optical techniques generate images containing the information he or she is looking for. A recent method for extracting most of this information is digital image processing, ranging from the simple contrast and contour enhancement to the advanced restoration of blurred texts. When combined with a sophisticated physical imaging system, an image pricessing system has proven to be a powerful and fast tool for routine non-destructive scanning of suspect documents. This article reviews frequent applications, comprising techniques to increase legibility, two-dimensional spectroscopy (ink discrimination, alterations, erased entries, etc.), comparison techniques (stamps, typescript letters, photo substitution), and densitometry. Computerized comparison of handwriting is not included. Copyright © 1989 Central Police University.

  14. ARCHAEOLOGICAL DOCUMENTATION OF A DEFUNCT IRAQI TOWN

    Directory of Open Access Journals (Sweden)

    J. Šedina

    2016-06-01

    Full Text Available The subject of this article is the possibilities of the documentation of a defunct town from the Pre-Islamic period to Early Islamic period. This town is located near the town Makhmur in Iraq. The Czech archaeological mission has worked at this dig site. This Cultural Heritage site is threatened by war because in the vicinity are positions of ISIS. For security reasons, the applicability of Pleiades satellite data has been tested. Moreover, this area is a no-fly zone. However, the DTM created from stereo-images was insufficient for the desired application in archeology. The subject of this paper is the testing of the usability of RPAS technology and terrestrial photogrammetry for documentation of the remains of buildings. RPAS is a very fast growing technology that combines the advantages of aerial photogrammetry and terrestrial photogrammetry. A probably defunct church is a sample object.

  15. A programmed text in statistics

    CERN Document Server

    Hine, J

    1975-01-01

    Exercises for Section 2 42 Physical sciences and engineering 42 43 Biological sciences 45 Social sciences Solutions to Exercises, Section 1 47 Physical sciences and engineering 47 49 Biological sciences 49 Social sciences Solutions to Exercises, Section 2 51 51 PhYSical sciences and engineering 55 Biological sciences 58 Social sciences 62 Tables 2 62 x - tests involving variances 2 63,64 x - one tailed tests 2 65 x - two tailed tests F-distribution 66-69 Preface This project started some years ago when the Nuffield Foundation kindly gave a grant for writing a pro­ grammed text to use with service courses in statistics. The work carried out by Mrs. Joan Hine and Professor G. B. Wetherill at Bath University, together with some other help from time to time by colleagues at Bath University and elsewhere. Testing was done at various colleges and universities, and some helpful comments were received, but we particularly mention King Edwards School, Bath, who provided some sixth formers as 'guinea pigs' for the fir...

  16. Terminologie de Base de la Documentation. (Basic Terminology of Documentation).

    Science.gov (United States)

    Commission des Communautes Europeennes (Luxembourg). Bureau de Terminologie.

    This glossary is designed to aid non-specialists whose activities require that they have some familiarity with the terminology of the modern methods of documentation. Definitions have been assembled from various dictionaries, manuals, etc., with particular attention being given to the publications of UNESCO and the International Standards…

  17. Transitioning Existing Content: inferring organisation-specific documents

    Directory of Open Access Journals (Sweden)

    Arijit Sengupta

    2000-11-01

    Full Text Available A definition for a document type within an organization represents an organizational norm about the way the organizational actors represent products and supporting evidence of organizational processes. Generating a good organization-specific document structure is, therefore, important since it can capture a shared understanding among the organizational actors about how certain business processes should be performed. Current tools that generate document type definitions focus on the underlying technology, emphasizing tags created in a single instance document. The tools, thus, fall short of capturing the shared understanding between organizational actors about how a given document type should be represented. We propose a method for inferring organization-specific document structures using multiple instance documents as inputs. The method consists of heuristics that combine individual document definitions, which may have been compiled using standard algorithms. We propose a number of heuristics utilizing artificial intelligence and natural language processing techniques. As the research progresses, the heuristics will be tested on a suite of test cases representing multiple instance documents for different document types. The complete methodology will be implemented as a research prototype

  18. Cat swarm optimization based evolutionary framework for multi document summarization

    Science.gov (United States)

    Rautray, Rasmita; Balabantaray, Rakesh Chandra

    2017-07-01

    Today, World Wide Web has brought us enormous quantity of on-line information. As a result, extracting relevant information from massive data has become a challenging issue. In recent past text summarization is recognized as one of the solution to extract useful information from vast amount documents. Based on number of documents considered for summarization, it is categorized as single document or multi document summarization. Rather than single document, multi document summarization is more challenging for the researchers to find accurate summary from multiple documents. Hence in this study, a novel Cat Swarm Optimization (CSO) based multi document summarizer is proposed to address the problem of multi document summarization. The proposed CSO based model is also compared with two other nature inspired based summarizer such as Harmony Search (HS) based summarizer and Particle Swarm Optimization (PSO) based summarizer. With respect to the benchmark Document Understanding Conference (DUC) datasets, the performance of all algorithms are compared in terms of different evaluation metrics such as ROUGE score, F score, sensitivity, positive predicate value, summary accuracy, inter sentence similarity and readability metric to validate non-redundancy, cohesiveness and readability of the summary respectively. The experimental analysis clearly reveals that the proposed approach outperforms the other summarizers included in the study.

  19. Reactor operation environmental information document

    Energy Technology Data Exchange (ETDEWEB)

    Haselow, J.S.; Price, V.; Stephenson, D.E.; Bledsoe, H.W.; Looney, B.B.

    1989-12-01

    The Savannah River Site (SRS) produces nuclear materials, primarily plutonium and tritium, to meet the requirements of the Department of Defense. These products have been formed in nuclear reactors that were built during 1950--1955 at the SRS. K, L, and P reactors are three of five reactors that have been used in the past to produce the nuclear materials. All three of these reactors discontinued operation in 1988. Currently, intense efforts are being extended to prepare these three reactors for restart in a manner that protects human health and the environment. To document that restarting the reactors will have minimal impacts to human health and the environment, a three-volume Reactor Operations Environmental Impact Document has been prepared. The document focuses on the impacts of restarting the K, L, and P reactors on both the SRS and surrounding areas. This volume discusses the geology, seismology, and subsurface hydrology. 195 refs., 101 figs., 16 tabs.

  20. Magnetic fusion: Environmental Readiness Document

    International Nuclear Information System (INIS)

    1981-03-01

    Environmental Readiness Documents are prepared periodically to review and evaluate the environmental status of an energy technology during the several phases of development of that technology. Through these documents, the Office of Environment within the Department of Energy provides an independent and objective assessment of the environmental risks and potential impacts associated with the progression of the technology to the next stage of development and with future extensive use of the technology. This Environmental Readiness Document was prepared to assist the Department of Energy in evaluating the readiness of magnetic fusion technology with respect to environmental issues. An effort has been made to identify potential environmental problems that may be encountered based upon current knowledge, proposed and possible new environmental regulations, and the uncertainties inherent in planned environmental research