large text document: Topics by WorldWideScience.org

Sample records for large text document

ParaText : scalable solutions for processing and searching very large document collections : final LDRD report.

Energy Technology Data Exchange (ETDEWEB)

Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.; Shead, Timothy M.

2010-09-01

This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.
Visualizing the semantic content of large text databases using text maps

Science.gov (United States)

Combs, Nathan

1993-01-01

A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
Text segmentation in degraded historical document images

Directory of Open Access Journals (Sweden)

A.S. Kavitha

2016-07-01

Full Text Available Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.
Documents and legal texts

International Nuclear Information System (INIS)

2017-01-01

This section treats of the following documents and legal texts: 1 - Belgium 29 June 2014 - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy; 2 - Belgium, 7 December 2016. - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy
Script-independent text line segmentation in freestyle handwritten documents.

Science.gov (United States)

Li, Yi; Zheng, Yefeng; Doermann, David; Jaeger, Stefan; Li, Yi

2008-08-01

Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.
"What is relevant in a text document?": An interpretable machine learning approach.

Directory of Open Access Journals (Sweden)

Leila Arras

Full Text Available Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP, a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.
A document preparation system in a large network environment

Energy Technology Data Exchange (ETDEWEB)

Vigil, M.; Bouchier, S.; Sanders, C.; Sydoriak, S.; Wheeler, K.

1988-01-01

At Los Alamos National Laboratory, we have developed an integrated document preparation system that produces publication-quality documents. This system combines text formatters and computer graphics capabilities that have been adapted to meet the needs of users in a large scientific research laboratory. This paper describes the integration of document processing technology to develop a system architecture, based on a page description language, to provide network-wide capabilities in a distributed computing environment. We describe the Laboratory requirements, the integration and implementation issues, and the challenges we faced developing this system.
Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

Science.gov (United States)

Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael

2015-01-01

We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
Documents and legal texts

International Nuclear Information System (INIS)

2016-01-01

This section treats of the following documents and legal texts: 1 - Brazil: Law No. 13,260 of 16 March 2016 (To regulate the provisions of item XLIII of Article 5 of the Federal Constitution on terrorism, dealing with investigative and procedural provisions and redefining the concept of a terrorist organisation; and amends Laws No. 7,960 of 21 December 1989 and No. 12,850 of 2 August 2013); 2 - India: The Atomic Energy (Amendment) Act, 2015; Department Of Atomic Energy Notification (Civil Liability for Nuclear Damage); 3 - Japan: Act on Subsidisation, etc. for Nuclear Damage Compensation Funds following the implementation of the Convention on Supplementary Compensation for Nuclear Damage
Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization

Science.gov (United States)

Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.

2017-01-01

Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.
Classification process in a text document recommender system

Directory of Open Access Journals (Sweden)

Dan MUNTEANU

2005-12-01

Full Text Available This paper presents the classification process in a recommender system used for textual documents taken especially from web. The system uses in the classification process a combination of content filters, event filters and collaborative filters and it uses implicit and explicit feedback for evaluating documents.
Text Mining in Biomedical Domain with Emphasis on Document Clustering.

Science.gov (United States)

Renganathan, Vinaitheerthan

2017-07-01

With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.
Documents and legal texts

International Nuclear Information System (INIS)

2013-01-01

This section reprints a selection of recently published legislative texts and documents: - Russian Federation: Federal Law No.170 of 21 November 1995 on the use of atomic energy, Adopted by the State Duma on 20 October 1995; - Uruguay: Law No.19.056 On the Radiological Protection and Safety of Persons, Property and the Environment (4 January 2013); - Japan: Third Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (concerning Damages related to Rumour-Related Damage in the Agriculture, Forestry, Fishery and Food Industries), 30 January 2013; - France and the United States: Joint Statement on Liability for Nuclear Damage (Aug 2013); - Franco-Russian Nuclear Power Declaration (1 November 2013)
Semantic Document Image Classification Based on Valuable Text Pattern

Directory of Open Access Journals (Sweden)

Hossein Pourghassem

2011-01-01

Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.
The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

Science.gov (United States)

Gunawan, D.; Sembiring, C. A.; Budiman, M. A.

2018-03-01

Rapidly increasing number of web pages or documents leads to topic specific filtering in order to find web pages or documents efficiently. This is a preliminary research that uses cosine similarity to implement text relevance in order to find topic specific document. This research is divided into three parts. The first part is text-preprocessing. In this part, the punctuation in a document will be removed, then convert the document to lower case, implement stop word removal and then extracting the root word by using Porter Stemming algorithm. The second part is keywords weighting. Keyword weighting will be used by the next part, the text relevance calculation. Text relevance calculation will result the value between 0 and 1. The closer value to 1, then both documents are more related, vice versa.
Machine printed text and handwriting identification in noisy document images.

Science.gov (United States)

Zheng, Yefeng; Li, Huiping; Doermann, David

2004-03-01

In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.
MeSH: a window into full text for document summarization.

Science.gov (United States)

Bhattacharya, Sanmitra; Ha-Thuc, Viet; Srinivasan, Padmini

2011-07-01

Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu.
A Typed Text Retrieval Query Language for XML Documents.

Science.gov (United States)

Colazzo, Dario; Sartiani, Carlo; Albano, Antonio; Manghi, Paolo; Ghelli, Giorgio; Lini, Luca; Paoli, Michele

2002-01-01

Discussion of XML focuses on a description of Tequyla-TX, a typed text retrieval query language for XML documents that can search on both content and structures. Highlights include motivations; numerous examples; word-based and char-based searches; tag-dependent full-text searches; text normalization; query algebra; data models and term language;…
Text extraction method for historical Tibetan document images based on block projections

Science.gov (United States)

Duan, Li-juan; Zhang, Xi-qun; Ma, Long-long; Wu, Jian

2017-11-01

Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.
Classification of protein-protein interaction full-text documents using text and citation network features.

Science.gov (United States)

Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

2010-01-01

We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

Text mining in the classification of digital documents

Directory of Open Access Journals (Sweden)

Marcial Contreras Barrera

2016-11-01

Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.
Text document classification

Czech Academy of Sciences Publication Activity Database

Novovičová, Jana

č. 62 (2005), s. 53-54 ISSN 0926-4981 R&D Projects: GA AV ČR IAA2075302; GA AV ČR KSK1019101; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : document representation * categorization * classification Subject RIV: BD - Theory of Information
Fast words boundaries localization in text fields for low quality document images

Science.gov (United States)

Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry

2018-04-01

The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3
Use of speech-to-text technology for documentation by healthcare providers.

Science.gov (United States)

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.
Documents and legal texts

International Nuclear Information System (INIS)

2015-01-01

This section treats of the following Documents and legal texts: 1 - Canada: Nuclear Liability and Compensation Act (An Act respecting civil liability and compensation for damage in case of a nuclear incident, repealing the Nuclear Liability Act and making consequential amendments to other acts); 2 - Japan: Act on Compensation for Nuclear Damage (The purpose of this act is to protect persons suffering from nuclear damage and to contribute to the sound development of the nuclear industry by establishing a basic system regarding compensation in case of nuclear damage caused by reactor operation etc.); Act on Indemnity Agreements for Compensation of Nuclear Damage; 3 - Slovak Republic: Act on Civil Liability for Nuclear Damage and on its Financial Coverage and on Changes and Amendments to Certain Laws (This Act regulates: a) The civil liability for nuclear damage incurred in the causation of a nuclear incident, b) The scope of powers of the Nuclear Regulatory Authority (hereinafter only as the 'Authority') in relation to the application of this Act, c) The competence of the National Bank of Slovakia in relation to the supervised financial market entities in the financial coverage of liability for nuclear damage; and d) The penalties for violation of this Act)
Means of storage and automated monitoring of versions of text technical documentation

Science.gov (United States)

Leonovets, S. A.; Shukalov, A. V.; Zharinov, I. O.

2018-03-01

The paper presents automation of the process of preparation, storage and monitoring of version control of a text designer, and program documentation by means of the specialized software is considered. Automation of preparation of documentation is based on processing of the engineering data which are contained in the specifications and technical documentation or in the specification. Data handling assumes existence of strictly structured electronic documents prepared in widespread formats according to templates on the basis of industry standards and generation by an automated method of the program or designer text document. Further life cycle of the document and engineering data entering it are controlled. At each stage of life cycle, archive data storage is carried out. Studies of high-speed performance of use of different widespread document formats in case of automated monitoring and storage are given. The new developed software and the work benches available to the developer of the instrumental equipment are described.
A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques

International Nuclear Information System (INIS)

Braga, Fabiane dos Reis

2013-01-01

This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)
Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

OpenAIRE

R, Amarnath; Nagabhushan, P.

2017-01-01

Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burde...
Finding Text Information in the Ocean of Electronic Documents

Energy Technology Data Exchange (ETDEWEB)

Medvick, Patricia A.; Calapristi, Augustin J.

2003-02-05

Information management in natural resources has become an overwhelming task. A massive amount of electronic documents and data is now available for creating informed decisions. The problem is finding the relevant information to support the decision-making process. Determining gaps in knowledge in order to propose new studies or to determine which proposals to fund for maximum potential is a time-consuming and difficult task. Additionally, available data stores are increasing in complexity; they now may include not only text and numerical data, but also images, sounds, and video recordings. Information visualization specialists at Pacific Northwest National Laboratory (PNNL) have software tools for exploring electronic data stores and for discovering and exploiting relationships within data sets. These provide capabilities for unstructured text explorations, the use of data signatures (a compact format for the essence of a set of scientific data) for visualization (Wong et al 2000), visualizations for multiple query results (Havre et al. 2001), and others (http://www.pnl.gov/infoviz ). We will focus on IN-SPIRE, a MS Windows vision of PNNL’s SPIRE (Spatial Paradigm for Information Retrieval and Exploration). IN-SPIRE was developed to assist information analysts find and discover information in huge masses of text documents.
Large Scale Document Inversion using a Multi-threaded Computing System.

Science.gov (United States)

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2017-06-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Documents and legal texts

International Nuclear Information System (INIS)

2014-01-01

This section of the Bulletin presents the recently published documents and legal texts sorted by country: - Brazil: Resolution No. 169 of 30 April 2014. - Japan: Act Concerning Exceptions to Interruption of Prescription Pertaining to Use of Settlement Mediation Procedures by the Dispute Reconciliation Committee for Nuclear Damage Compensation in relation to Nuclear Damage Compensation Disputes Pertaining to the Great East Japan Earthquake (Act No. 32 of 5 June 2013); Act Concerning Measures to Achieve Prompt and Assured Compensation for Nuclear Damage Arising from the Nuclear Plant Accident following the Great East Japan Earthquake and Exceptions to the Extinctive Prescription, etc. of the Right to Claim Compensation for Nuclear Damage (Act No. 97 of 11 December 2013); Fourth Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage Resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.); Outline of 'Fourth Supplement to Interim Guidelines (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.)'. - OECD Nuclear Energy Agency: Decision and Recommendation of the Steering Committee Concerning the Application of the Paris Convention to Nuclear Installations in the Process of Being Decommissioned; Joint Declaration on the Security of Supply of Medical Radioisotopes. - United Arab Emirates: Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage; Ratification of the Federal Supreme Council of Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage
Large Scale Document Inversion using a Multi-threaded Computing System

Science.gov (United States)

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2018-01-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Invariant practical tasks for work with text documents at the secondary school

Directory of Open Access Journals (Sweden)

Л И Карташова

2013-12-01

Full Text Available In article examples of practical tasks on creation, editing and formatting of text documents focused on pupils of the secondary school are given. Tasks have invariant character and don't depend on concrete software.
Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform

Directory of Open Access Journals (Sweden)

Abdelghani Souhar

2017-12-01

Full Text Available A crucial task in character recognition systems is the segmentation of the document into text lines and especially if it is handwritten. When dealing with non-Latin document such as Arabic, the challenge becomes greater since in addition to the variability of writing, the presence of diacritical points and the high number of ascender and descender characters complicates more the process of the segmentation. To remedy with this complexity and even to make this difficulty an advantage since the focus is on the Arabic language which is semi-cursive in nature, a method based on the Watershed Transform technique is proposed. Tested on «Handwritten Arabic Proximity Datasets» a segmentation rate of 93% for a 95% of matching score is achieved.
SYSTEM «PlagiarismControl» AS THE TOOL FOR THE EXPERTISE OF THE TEXT DOCUMENTS

Directory of Open Access Journals (Sweden)

Yu. B. Krapivin

2018-01-01

Full Text Available The description and the operability analysis of the implemented instrumental software system «PlagiarismControl» has been done. The system affords to automatize solving the task of the identification of the adopted fragments in the given text document both from the local full-text user’s database and from the Internet. The system affords solving the task taking in account explicit as well as implicit adoptions with precision up to lexical units paradigms and both lexical and grammatical synonymy relations, according to the structural-functional schematic diagram of the system of the automatic recognition of reproduced fragments of the text documents. «PlagiarismControl» is able to work in different modes, to automatize the work of the expert and to speed up significantly the procedure of the analysis of the documents, with the purpose of recognition of the adoptions (plagiarism from other text documents.
Large Hospital 50% Energy Savings: Technical Support Document

Energy Technology Data Exchange (ETDEWEB)

Bonnema, E.; Studer, D.; Parker, A.; Pless, S.; Torcellini, P.

2010-09-01

This Technical Support Document documents the technical analysis and design guidance for large hospitals to achieve whole-building energy savings of at least 50% over ANSI/ASHRAE/IESNA Standard 90.1-2004 and represents a step toward determining how to provide design guidance for aggressive energy savings targets. This report documents the modeling methods used to demonstrate that the design recommendations meet or exceed the 50% goal. EnergyPlus was used to model the predicted energy performance of the baseline and low-energy buildings to verify that 50% energy savings are achievable. Percent energy savings are based on a nominal minimally code-compliant building and whole-building, net site energy use intensity. The report defines architectural-program characteristics for typical large hospitals, thereby defining a prototype model; creates baseline energy models for each climate zone that are elaborations of the prototype models and are minimally compliant with Standard 90.1-2004; creates a list of energy design measures that can be applied to the prototype model to create low-energy models; uses industry feedback to strengthen inputs for baseline energy models and energy design measures; and simulates low-energy models for each climate zone to show that when the energy design measures are applied to the prototype model, 50% energy savings (or more) are achieved.
Chyawanprash: A review of therapeutic benefits as in authoritative texts and documented clinical literature.

Science.gov (United States)

Narayana, D B Anantha; Durg, Sharanbasappa; Manohar, P Ram; Mahapatra, Anita; Aramya, A R

2017-02-02

Chyawanprash (CP), a traditional immune booster recipe, has a long history of ethnic origin, development, household preparation and usage. There are even mythological stories about the origin of this recipe including its nomenclature. In the last six decades, CP, because of entrepreneurial actions of some research Vaidyas (traditional doctors) has grown to industrial production and marketing in packed forms to a large number of consumers/patients like any food or health care product. Currently, CP has acquired a large accepted user base in India and in a few countries out-side India. Authoritative texts, recognized by the Drugs and Cosmetics Act of India, describe CP as an immunity enhancer and strength giver meant for improving lung functions in diseases with compromised immunity. This review focuses on published clinical efficacy and safety studies of CP for correlation with health benefits as documented in the authoritative texts, and also briefs on its recipes and processes. Authoritative texts were searched for recipes, processes, and other technical details of CP. Labels of marketing CP products (Indian) were studied for the health claims. Electronic search for studies of CP on efficacy and safety data were performed in PubMed/MEDLINE and DHARA (Digital Helpline for Ayurveda Research Articles), and Ayurvedic books were also searched for clinical studies. The documented clinical studies from electronic databases and Ayurvedic books evidenced that individuals who consume CP regularly for a definite period of time showed improvement in overall health status and immunity. However, most of the clinical studies in this review are of smaller sample size and short duration. Further, limitation to access and review significant data on traditional products like CP in electronic databases was noted. Randomized controlled trials of high quality with larger sample size and longer follow-up are needed to have significant evidence on the clinical use of CP as immunity
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

2010-01-01

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
Leveraging Text Content for Management of Construction Project Documents

Science.gov (United States)

Alqady, Mohammed

2012-01-01

The construction industry is a knowledge intensive industry. Thousands of documents are generated by construction projects. Documents, as information carriers, must be managed effectively to ensure successful project management. The fact that a single project can produce thousands of documents and that a lot of the documents are generated in a…
LOG2MARKUP: State module to transform a Stata text log into a markup document

DEFF Research Database (Denmark)

2016-01-01

log2markup extract parts of the text version from the Stata log command and transform the logfile into a markup based document with the same name, but with extension markup (or otherwise specified in option extension) instead of log. The author usually uses markdown for writing documents. However...

Proxima: a presentation-oriented editor for structured documents

OpenAIRE

Schrage, M.M.

2004-01-01

A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences between editors, the core editing behavior, whether performed in a word-processor or a spreadsheet, is largely similar: document fragments may be copied and pasted, and new parts of the document may be ...
Ultrasound-guided nerve blocks--is documentation and education feasible using only text and pictures?

Directory of Open Access Journals (Sweden)

Bjarne Skjødt Worm

Full Text Available PURPOSE: With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings. MATERIALS AND METHODS: We undertook a clinical study to quantify the maximal gross visibility or ultrastructure of seven peripheral nerves identified by either real time ultrasound (clinical cohort, n = 635 or by still picture ultrasonograms (clinical cohort, n = 112. In addition, we undertook a study on test subjects (n = 4 to quantify interobserver variations and potential bias among expert and trainee observers. RESULTS: When comparing real time ultrasound and interpretation of still picture sonograms, gross identification of large nerves was reduced by 15% and 40% by expert and trainee observers, respectively, while gross identification of small nerves was reduced by 29% and 66%. Identification of within-nerve ultrastructure was even less. For all nerve sizes, trainees were unable to identify any anatomical structure in 24 to 34%, while experts were unable to identify anything in 9 to 10%. CONCLUSION: Exhaustive ultrasonography experience and real time ultrasound measurements seem to be keystones in obtaining optimal nerve identification. In contrast the use of still pictures appears to be insufficient for documentation as well as educational purposes. Alternatives such as video clips or enhanced picture technology are encouraged
BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Science.gov (United States)

Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M

2013-01-01

De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
Documentation is Documentation and Theory is Theory: A Reply to Daniel Avorgbedor's Commentary "Documenting Spoken and Sung Texts of the Dagaaba of West Africa"

Directory of Open Access Journals (Sweden)

Manolete Mora

2007-11-01

Full Text Available In a response to an article that appeared in Empirical Musicology Review (Bodomo and Mora 2007, Avorgbedor (2007 takes issue with aspects of the paper. In our reply to Avorgbedor’s response we will firstly clarify some issues raised therein and secondly address the issue about the relationship between theory, description and documentation within linguistics and musicology.
Software Manages Documentation in a Large Test Facility

Science.gov (United States)

Gurneck, Joseph M.

2001-01-01

The 3MCS computer program assists and instrumentation engineer in performing the 3 essential functions of design, documentation, and configuration management of measurement and control systems in a large test facility. Services provided by 3MCS are acceptance of input from multiple engineers and technicians working at multiple locations;standardization of drawings;automated cross-referencing; identification of errors;listing of components and resources; downloading of test settings; and provision of information to customers.
Proxima: a presentation-oriented editor for structured documents

NARCIS (Netherlands)

Schrage, M.M.

2004-01-01

A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences
The Analysis of Heterogeneous Text Documents with the Help of the Computer Program NUD*IST

Directory of Open Access Journals (Sweden)

Christine Plaß

2000-12-01

Full Text Available On the basis of a current research project we discuss the use of the computer program NUD*IST for the analysis and archiving of qualitative documents. Our project examines the social evaluation of spectacular criminal offenses and we identify, digitize and analyze documents from the entire 20th century. Since public and scientific discourses are examined, the data of the project are extraordinarily heterogeneous: scientific publications, court records, newspaper reports, and administrative documents. We want to show how to transfer general questions into a systematic categorization with the assistance of NUD*IST. Apart from the functions, possibilities and limitations of the application of NUD*IST, concrete work procedures and difficulties encountered are described. URN: urn:nbn:de:0114-fqs0003211
Text Mining the History of Medicine.

Science.gov (United States)

Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

2016-01-01

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while
Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text.

Directory of Open Access Journals (Sweden)

Arwa Bin Raies

Full Text Available BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download.
Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

KAUST Repository

Bin Raies, Arwa

2013-10-16

Background:In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.Methodology:We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text.Conclusion:The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. © 2013 Bin Raies et al.
Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document

Directory of Open Access Journals (Sweden)

Nurdiansyah Yanuar

2018-01-01

Full Text Available Plagiarism occurs when the students have tasks and pursued by the deadline. Plagiarism is considered as the fastest way to accomplish the tasks. This reason makes the author tried to build a plagiarism detection system with Winnowing algorithm as document similarity search algorithm. The documents that being tested are Indonesian journals with extension .doc, .docx, and/or .txt. Similarity calculation process through two stages, the first is the process of making a document fingerprint using Winnowing algorithm and the second is using Jaccard coefficient similarity. In order to develop this system, the author used iterative waterfall model approach. The main objective of this project is to determine the level of plagiarism. It is expected to prevent plagiarism either intentionally or unintentionally before our journal published by displaying the percentage of similarity in the journals that we make.
FacetGist: Collective Extraction of Document Facets in Large Technical Corpora.

Science.gov (United States)

Siddiqui, Tarique; Ren, Xiang; Parameswaran, Aditya; Han, Jiawei

2016-10-01

Given the large volume of technical documents available, it is crucial to automatically organize and categorize these documents to be able to understand and extract value from them. Towards this end, we introduce a new research problem called Facet Extraction. Given a collection of technical documents, the goal of Facet Extraction is to automatically label each document with a set of concepts for the key facets ( e.g. , application, technique, evaluation metrics, and dataset) that people may be interested in. Facet Extraction has numerous applications, including document summarization, literature search, patent search and business intelligence. The major challenge in performing Facet Extraction arises from multiple sources: concept extraction, concept to facet matching, and facet disambiguation. To tackle these challenges, we develop FacetGist, a framework for facet extraction. Facet Extraction involves constructing a graph-based heterogeneous network to capture information available across multiple local sentence-level features, as well as global context features. We then formulate a joint optimization problem, and propose an efficient algorithm for graph-based label propagation to estimate the facet of each concept mention. Experimental results on technical corpora from two domains demonstrate that Facet Extraction can lead to an improvement of over 25% in both precision and recall over competing schemes.
Documentation of body mass index and control of associated risk factors in a large primary care network

Directory of Open Access Journals (Sweden)

Grant Richard W

2009-12-01

Full Text Available Abstract Background Body mass index (BMI will be a reportable health measure in the United States (US through implementation of Healthcare Effectiveness Data and Information Set (HEDIS guidelines. We evaluated current documentation of BMI, and documentation and control of associated risk factors by BMI category, based on electronic health records from a 12-clinic primary care network. Methods We conducted a cross-sectional analysis of 79,947 active network patients greater than 18 years of age seen between 7/05 - 12/06. We defined BMI category as normal weight (NW, 18-24.9 kg/m2, overweight (OW, 25-29.9, and obese (OB, ≥ 30. We measured documentation (yes/no and control (above/below of the following three risk factors: blood pressure (BP ≤130/≤85 mmHg, low-density lipoprotein (LDL ≤130 mg/dL (3.367 mmol/L, and fasting glucose Results BMI was documented in 48,376 patients (61%, range 34-94%, distributed as 30% OB, 34% OW, and 36% NW. Documentation of all three risk factors was higher in obesity (OB = 58%, OW = 54%, NW = 41%, p for trend Conclusions In a large primary care network BMI documentation has been incomplete and for patients with BMI measured, risk factor control has been poorer in obese patients compared with NW, even in those with obesity and CVD or diabetes. Better knowledge of BMI could provide an opportunity for improved quality in obesity care.
On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency.

Science.gov (United States)

Ellis, David; And Others

1994-01-01

Describes a study in which several different sets of hypertext links are inserted by different people in full-text documents. The degree of similarity between the sets is measured using coefficients and topological indices. As in comparable studies of inter-indexer consistency, the sets of links used by different people showed little similarity.…
Working with text tools, techniques and approaches for text mining

CERN Document Server

Tourte, Gregory J L

2016-01-01

Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...
Ultrasound-guided nerve blocks - is documentation and education feasible using only text and pictures?

DEFF Research Database (Denmark)

Worm, Bjarne Skjødt; Krag, Mette; Jensen, Kenneth

2014-01-01

With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about...... the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees...... and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings....
Technical Support Document: Strategies for 50% Energy Savings in Large Office Buildings

Energy Technology Data Exchange (ETDEWEB)

Leach, M.; Lobato, C.; Hirsch, A.; Pless, S.; Torcellini, P.

2010-09-01

This Technical Support Document (TSD) documents technical analysis that informs design guidance for designing and constructing large office buildings that achieve 50% net site energy savings over baseline buildings defined by minimal compliance with respect to ANSI/ASHRAE/IESNA Standard 90.1-2004. This report also represents a step toward developing a methodology for using energy modeling in the design process to achieve aggressive energy savings targets. This report documents the modeling and analysis methods used to identify design recommendations for six climate zones that capture the range of U.S. climate variability; demonstrates how energy savings change between ASHRAE Standard 90.1-2007 and Standard 90.1-2004 to determine baseline energy use; uses a four-story 'low-rise' prototype to analyze the effect of building aspect ratio on energy use intensity; explores comparisons between baseline and low-energy building energy use for alternate energy metrics (net source energy, energy emissions, and energy cost); and examines the extent to which glass curtain construction limits achieve energy savings by using a 12-story 'high-rise' prototype.
Polish Phoneme Statistics Obtained On Large Set Of Written Texts

Directory of Open Access Journals (Sweden)

Bartosz Ziółko

2009-01-01

Full Text Available The phonetical statistics were collected from several Polish corpora. The paper is a summaryof the data which are phoneme n-grams and some phenomena in the statistics. Triphonestatistics apply context-dependent speech units which have an important role in speech recognitionsystems and were never calculated for a large set of Polish written texts. The standardphonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.
Helios: Understanding Solar Evolution Through Text Analytics

Energy Technology Data Exchange (ETDEWEB)

Randazzese, Lucien [SRI International, Menlo Park, CA (United States)

2016-12-02

This proof-of-concept project focused on developing, testing, and validating a range of bibliometric, text analytic, and machine-learning based methods to explore the evolution of three photovoltaic (PV) technologies: Cadmium Telluride (CdTe), Dye-Sensitized solar cells (DSSC), and Multi-junction solar cells. The analytical approach to the work was inspired by previous work by the same team to measure and predict the scientific prominence of terms and entities within specific research domains. The goal was to create tools that could assist domain-knowledgeable analysts in investigating the history and path of technological developments in general, with a focus on analyzing step-function changes in performance, or “breakthroughs,” in particular. The text-analytics platform developed during this project was dubbed Helios. The project relied on computational methods for analyzing large corpora of technical documents. For this project we ingested technical documents from the following sources into Helios: Thomson Scientific Web of Science (papers), the U.S. Patent & Trademark Office (patents), the U.S. Department of Energy (technical documents), the U.S. National Science Foundation (project funding summaries), and a hand curated set of full-text documents from Thomson Scientific and other sources.
Text document classification based on mixture models

Czech Academy of Sciences Publication Activity Database

Novovičová, Jana; Malík, Antonín

2004-01-01

Roč. 40, č. 3 (2004), s. 293-304 ISSN 0023-5954 R&D Projects: GA AV ČR IAA2075302; GA ČR GA102/03/0049; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : text classification * text categorization * multinomial mixture model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

Drug allergies documented in electronic health records of a large healthcare system.

Science.gov (United States)

Zhou, L; Dhopeshwarkar, N; Blumenthal, K G; Goss, F; Topaz, M; Slight, S P; Bates, D W

2016-09-01

The prevalence of drug allergies documented in electronic health records (EHRs) of large patient populations is understudied. We aimed to describe the prevalence of common drug allergies and patient characteristics documented in EHRs of a large healthcare network over the last two decades. Drug allergy data were obtained from EHRs of patients who visited two large tertiary care hospitals in Boston from 1990 to 2013. The prevalence of each drug and drug class was calculated and compared by sex and race/ethnicity. The number of allergies per patient was calculated and the frequency of patients having 1, 2, 3…, or 10+ drug allergies was reported. We also conducted a trend analysis by comparing the proportion of each allergy to the total number of drug allergies over time. Among 1 766 328 patients, 35.5% of patients had at least one reported drug allergy with an average of 1.95 drug allergies per patient. The most commonly reported drug allergies in this population were to penicillins (12.8%), sulfonamide antibiotics (7.4%), opiates (6.8%), and nonsteroidal anti-inflammatory drugs (NSAIDs) (3.5%). The relative proportion of allergies to angiotensin-converting enzyme (ACE) inhibitors and HMG CoA reductase inhibitors (statins) have more than doubled since early 2000s. Drug allergies were most prevalent among females and white patients except for NSAIDs, ACE inhibitors, and thiazide diuretics, which were more prevalent in black patients. Females and white patients may be more likely to experience a reaction from common medications. An increase in reported allergies to ACE inhibitors and statins is noteworthy. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Supporting the education evidence portal via text mining

Science.gov (United States)

Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

2010-01-01

The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679
Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

Science.gov (United States)

White, Sheida

2012-01-01

This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…
Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

Directory of Open Access Journals (Sweden)

Hamish Cunningham

Full Text Available This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
Word-Length Correlations and Memory in Large Texts: A Visibility Network Analysis

Directory of Open Access Journals (Sweden)

Lev Guzmán-Vargas

2015-11-01

Full Text Available We study the correlation properties of word lengths in large texts from 30 ebooks in the English language from the Gutenberg Project (www.gutenberg.org using the natural visibility graph method (NVG. NVG converts a time series into a graph and then analyzes its graph properties. First, the original sequence of words is transformed into a sequence of values containing the length of each word, and then, it is integrated. Next, we apply the NVG to the integrated word-length series and construct the network. We show that the degree distribution of that network follows a power law, P ( k ∼ k - γ , with two regimes, which are characterized by the exponents γ s ≈ 1 . 7 (at short degree scales and γ l ≈ 1 . 3 (at large degree scales. This suggests that word lengths are much more strongly correlated at large distances between words than at short distances between words. That finding is also supported by the detrended fluctuation analysis (DFA and recurrence time distribution. These results provide new information about the universal characteristics of the structure of written texts beyond that given by word frequencies.
Identifying issue frames in text.

Directory of Open Access Journals (Sweden)

Eyal Sagi

Full Text Available Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S.
Vietnamese Document Representation and Classification

Science.gov (United States)

Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.
Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

Science.gov (United States)

Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

2016-01-01

The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).
New mathematical cuneiform texts

CERN Document Server

Friberg, Jöran

2016-01-01

This monograph presents in great detail a large number of both unpublished and previously published Babylonian mathematical texts in the cuneiform script. It is a continuation of the work A Remarkable Collection of Babylonian Mathematical Texts (Springer 2007) written by Jöran Friberg, the leading expert on Babylonian mathematics. Focussing on the big picture, Friberg explores in this book several Late Babylonian arithmetical and metro-mathematical table texts from the sites of Babylon, Uruk and Sippar, collections of mathematical exercises from four Old Babylonian sites, as well as a new text from Early Dynastic/Early Sargonic Umma, which is the oldest known collection of mathematical exercises. A table of reciprocals from the end of the third millennium BC, differing radically from well-documented but younger tables of reciprocals from the Neo-Sumerian and Old-Babylonian periods, as well as a fragment of a Neo-Sumerian clay tablet showing a new type of a labyrinth are also discussed. The material is presen...
The Effects of Tabular-Based Content Extraction on Patent Document Clustering

Directory of Open Access Journals (Sweden)

Michael W. Berry

2012-10-01

Full Text Available Data can be represented in many different ways within a particular document or set of documents. Hence, attempts to automatically process the relationships between documents or determine the relevance of certain document objects can be problematic. In this study, we have developed software to automatically catalog objects contained in HTML files for patents granted by the United States Patent and Trademark Office (USPTO. Once these objects are recognized, the software creates metadata that assigns a data type to each document object. Such metadata can be easily processed and analyzed for subsequent text mining tasks. Specifically, document similarity and clustering techniques were applied to a subset of the USPTO document collection. Although our preliminary results demonstrate that tables and numerical data do not provide quantifiable value to a document’s content, the stage for future work in measuring the importance of document objects within a large corpus has been set.
Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

Science.gov (United States)

Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina

2013-01-01

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.
Changing Landscapes in Documentation Efforts: Civil Society Documentation of Serious Human Rights Violations

Directory of Open Access Journals (Sweden)

Brianne McGonigle Leyh

2017-04-01

Full Text Available Wittingly or unwittingly, civil society actors have long been faced with the task of documenting serious human rights violations. Thirty years ago, such efforts were largely organised by grassroots movements, often with little support or funding from international actors. Sharing information and best practices was difficult. Today that situation has significantly changed. The purpose of this article is to explore the changing landscape of civil society documentation of serious human rights violations, and what that means for standardising and professionalising documentation efforts. Using the recent Hisséne Habré case as an example, this article begins by looking at how civil society documentation can successfully influence an accountability process. Next, the article touches upon barriers that continue to impede greater documentation efforts. The article examines the changing landscape of documentation, focusing on technological changes and the rise of citizen journalism and unofficial investigations, using Syria as an example, as well as on the increasing support for documentation efforts both in Syria and worldwide. The changing landscape has resulted in the proliferation of international documentation initiatives aimed at providing local civil society actors guidelines and practical assistance on how to recognise, collect, manage, store and use information about serious human rights violations, as well as on how to minimise the risks associated with the documentation of human rights violations. The recent initiatives undertaken by international civil society, including those by the Public International Law & Policy Group, play an important role in helping to standardise and professionalise documentation work and promote the foundational principles of documentation, namely the ‘do no harm’ principle, and the principles of informed consent and confidentiality. Recognising the drawback that greater professionalisation may bring, it
Document retrieval on repetitive string collections.

Science.gov (United States)

Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J; Sirén, Jouni

2017-01-01

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists , that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top- k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.
Using machine learning to disentangle homonyms in large text corpora.

Science.gov (United States)

Roll, Uri; Correia, Ricardo A; Berger-Tal, Oded

2018-06-01

Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for example, homonyms (terms that share spelling but differ in meaning). Homonyms add noise to search results and cannot be easily identified or removed. We developed a semiautomated approach that can aid in the classification of homonyms among narratives. We used a combination of automated content analysis and artificial neural networks to quickly and accurately sift through large corpora of academic texts and classify them to distinct topics. As an example, we explored the use of the word reintroduction in academic texts. Reintroduction is used within the conservation context to indicate the release of organisms to their former native habitat; however, a Web of Science search for this word returned thousands of publications in which the term has other meanings and contexts. Using our method, we automatically classified a sample of 3000 of these publications with over 99% accuracy, relative to a manual classification. Our approach can be used easily with other homonyms and can greatly facilitate systematic reviews or similar work in which homonyms hinder the harnessing of large text corpora. Beyond homonyms we see great promise in combining automated content analysis and machine-learning methods to handle and screen big data for relevant information in conservation science. © 2017 Society for Conservation Biology.
Enhancing biomedical text summarization using semantic relation extraction.

Directory of Open Access Journals (Sweden)

Yue Shang

Full Text Available Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1 We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2 We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3 For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
Text processing for technical reports (direct computer-assisted origination, editing, and output of text)

Energy Technology Data Exchange (ETDEWEB)

De Volpi, A.; Fenrick, M. R.; Stanford, G. S.; Fink, C. L.; Rhodes, E. A.

1980-10-01

Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value to professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.
Computer-assisted documentation: One device to keep your nose above the water

International Nuclear Information System (INIS)

Church, L.B.

1980-01-01

Because of the large number of student operators and trainees at the Reed College Reactor Facility, there is a large demand for access to our documentation. In the past the standard mimeograph approach has been used to make available Tech Specs, Administrative Procedures, Emergency Plans, etc., etc. However, the frequency of change in these documents (often relatively minor in nature) causes an entire document to be outdated. To provide easier student access, to help keep the documentation up to date and to do so at a minimum cost, we have started using the text editor on our computer. On the whole the experiment has been very well received; some of the more important pro's and con's will be discussed. (author)
Features based approach for indexation and representation of unstructured Arabic documents

Directory of Open Access Journals (Sweden)

Mohamed Salim El Bazzi

2017-06-01

Full Text Available The increase of textual information published in Arabic language on the internet, public libraries and administrations requires implementing effective techniques for the extraction of relevant information contained in large corpus of texts. The purpose of indexing is to create a document representation that easily find and identify the relevant information in a set of documents. However, mining textual data is becoming a complicated task, especially when taking semantic into consideration. In this paper, we will present an indexation system based on contextual representation that will take the advantage of semantic links given in a document. Our approach is based on the extraction of keyphrases. Then, each document is represented by its relevant keyphrases instead of its simple keywords. The experimental results confirms the effectiveness of our approach.
VisualUrText: A Text Analytics Tool for Unstructured Textual Data

Science.gov (United States)

Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

2018-05-01

The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.
RECOVERY OF DOCUMENT TEXT FROM TORN FRAGMENTS USING IMAGE PROCESSING

OpenAIRE

C.Prasad; Dr.Mahesh; Dr.S.A.K. Jilani

2016-01-01

Recovery of document from its torn or damaged fragments play an important role in the field of forensics and archival study. Reconstruction of the torn papers manually with the help of glue and tapes etc., is tedious, time consuming and not satisfactory. For torn images reconstruction we go for image mosaicing, where we reconstruct the image using features (corners) and RANSAC with homography.But for the torn fragments there is no such similarity portion between fragments. Hence we propose a ...

Enhancing biomedical text summarization using semantic relation extraction.

Science.gov (United States)

Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

2011-01-01

Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.
Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH

Energy Technology Data Exchange (ETDEWEB)

Bogen, Paul Logasa [ORNL; Symons, Christopher T [ORNL; McKenzie, Amber T [ORNL; Patton, Robert M [ORNL; Gillen, Rob [ORNL

2013-01-01

In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing sets of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.
Content analysis to detect high stress in oral interviews and text documents

Science.gov (United States)

Thirumalainambi, Rajkumar (Inventor); Jorgensen, Charles C. (Inventor)

2012-01-01

A system of interrogation to estimate whether a subject of interrogation is likely experiencing high stress, emotional volatility and/or internal conflict in the subject's responses to an interviewer's questions. The system applies one or more of four procedures, a first statistical analysis, a second statistical analysis, a third analysis and a heat map analysis, to identify one or more documents containing the subject's responses for which further examination is recommended. Words in the documents are characterized in terms of dimensions representing different classes of emotions and states of mind, in which the subject's responses that manifest high stress, emotional volatility and/or internal conflict are identified. A heat map visually displays the dimensions manifested by the subject's responses in different colors, textures, geometric shapes or other visually distinguishable indicia.
Social Media Text Classification by Enhancing Well-Formed Text Trained Model

Directory of Open Access Journals (Sweden)

Phat Jotikabukkana

2016-09-01

Full Text Available Social media are a powerful communication tool in our era of digital information. The large amount of user-generated data is a useful novel source of data, even though it is not easy to extract the treasures from this vast and noisy trove. Since classification is an important part of text mining, many techniques have been proposed to classify this kind of information. We developed an effective technique of social media text classification by semi-supervised learning utilizing an online news source consisting of well-formed text. The computer first automatically extracts news categories, well-categorized by publishers, as classes for topic classification. A bag of words taken from news articles provides the initial keywords related to their category in the form of word vectors. The principal task is to retrieve a set of new productive keywords. Term Frequency-Inverse Document Frequency weighting (TF-IDF and Word Article Matrix (WAM are used as main methods. A modification of WAM is recomputed until it becomes the most effective model for social media text classification. The key success factor was enhancing our model with effective keywords from social media. A promising result of 99.50% accuracy was achieved, with more than 98.5% of Precision, Recall, and F-measure after updating the model three times.
Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw.

Science.gov (United States)

Görg, Carsten; Liu, Zhicheng; Kihm, Jaeyeon; Choo, Jaegul; Park, Haesun; Stasko, John

2013-10-01

Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: an academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.
Document Categorization with Modified Statistical Language Models for Agglutinative Languages

Directory of Open Access Journals (Sweden)

Tantug

2010-11-01

Full Text Available In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serious data sparseness problems. In order to cope with this drawback, previous studies in various application areas suggest modified language models based on different morphological units. It is reported that performance improvements can be achieved with these modified language models. In our document categorization experiments, we use standard word form based language models as well as other modified language models based on root words, root words and part-of-speech information, truncated word forms and character sequences. Additionally, to find an optimum parameter set, multiple tests are carried out with different language model orders and smoothing methods. Similar to previous studies on other tasks, our experimental results on categorization of Turkish documents reveal that applying linguistic preprocessing steps for language modeling provides improvements over standard language models to some extent. However, it is also observed that similar level of performance improvements can also be acquired by simpler character level or truncated word form models which are language independent.
A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques; Metodologia para extracao semiautomatica de uma taxonomia de conceitos a partir da producao cientifica da area nuclear utilizando tecnicas de mineracao de textos

Energy Technology Data Exchange (ETDEWEB)

Braga, Fabiane dos Reis

2013-07-01

This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)
Born Broken: Fonts and Information Loss in Legacy Digital Documents

Directory of Open Access Journals (Sweden)

Geoffrey Brown

2011-03-01

Full Text Available For millions of legacy documents, correct rendering depends upon resources such as fonts that are not generally embedded within the document structure. Yet there is a significant risk of information loss due to missing or incorrectly substituted fonts. Large document collections depend on thousands of unique fonts not available on a common desktop workstation, which typically has between 100 and 200 fonts. Silent substitution of fonts, performed by applications such as Microsoft Office, can yield poorly rendered documents. In this paper we use a collection of 230,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.
Automatic extraction of property norm-like data from large text corpora.

Science.gov (United States)

Kelly, Colin; Devereux, Barry; Korhonen, Anna

2014-01-01

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.
Information Gain Based Dimensionality Selection for Classifying Text Documents

Energy Technology Data Exchange (ETDEWEB)

Dumidu Wijayasekara; Milos Manic; Miles McQueen

2013-06-01

Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.
Pedoinformatics Approach to Soil Text Analytics

Science.gov (United States)

Furey, J.; Seiter, J.; Davis, A.

2017-12-01

The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.
Experiences with Text Mining Large Collections of Unstructured Systems Development Artifacts at JPL

Science.gov (United States)

Port, Dan; Nikora, Allen; Hihn, Jairus; Huang, LiGuo

2011-01-01

Often repositories of systems engineering artifacts at NASA's Jet Propulsion Laboratory (JPL) are so large and poorly structured that they have outgrown our capability to effectively manually process their contents to extract useful information. Sophisticated text mining methods and tools seem a quick, low-effort approach to automating our limited manual efforts. Our experiences of exploring such methods mainly in three areas including historical risk analysis, defect identification based on requirements analysis, and over-time analysis of system anomalies at JPL, have shown that obtaining useful results requires substantial unanticipated efforts - from preprocessing the data to transforming the output for practical applications. We have not observed any quick 'wins' or realized benefit from short-term effort avoidance through automation in this area. Surprisingly we have realized a number of unexpected long-term benefits from the process of applying text mining to our repositories. This paper elaborates some of these benefits and our important lessons learned from the process of preparing and applying text mining to large unstructured system artifacts at JPL aiming to benefit future TM applications in similar problem domains and also in hope for being extended to broader areas of applications.
A method for extracting design rationale knowledge based on Text Mining

Directory of Open Access Journals (Sweden)

Liu Jihong

2017-01-01

Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.
Semantic Annotation of Unstructured Documents Using Concepts Similarity

Directory of Open Access Journals (Sweden)

Fernando Pech

2017-01-01

Full Text Available There is a large amount of information in the form of unstructured documents which pose challenges in the information storage, search, and retrieval. This situation has given rise to several information search approaches. Some proposals take into account the contextual meaning of the terms specified in the query. Semantic annotation technique can help to retrieve and extract information in unstructured documents. We propose a semantic annotation strategy for unstructured documents as part of a semantic search engine. In this proposal, ontologies are used to determine the context of the entities specified in the query. Our strategy for extracting the context is focused on concepts similarity. Each relevant term of the document is associated with an instance in the ontology. The similarity between each of the explicit relationships is measured through the combination of two types of associations: the association between each pair of concepts and the calculation of the weight of the relationships.
Multilingual access to full text databases; Acces multilingue aux bases de donnees en texte integral

Energy Technology Data Exchange (ETDEWEB)

Fluhr, C; Radwan, K [Institut National des Sciences et Techniques Nucleaires (INSTN), Centre d` Etudes de Saclay, 91 - Gif-sur-Yvette (France)

1990-05-01

Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs.
The LawsAndFamilies questionnaire on legal family formats for same-sex and/or different-sex couples : Text of the questions and of the accompanying guidance document.

NARCIS (Netherlands)

Waaldijk, C.; Lorenzo, Villaverde J.M.; Nikolina, N.; Zago, G.

2016-01-01

This Working Paper of the research project FamiliesAndSocieties contains the text of the LawsAndFamilies questionnaire, plus the text of the guidance document provided to legal experts answering this questionnaire. These texts are preceded by a brief introduction to the background, aims and
Technical Support Document: Development of the Advanced Energy Design Guide for Large Hospitals - 50% Energy Savings

Energy Technology Data Exchange (ETDEWEB)

Bonnema, E.; Leach, M.; Pless, S.

2013-06-01

This Technical Support Document describes the process and methodology for the development of the Advanced Energy Design Guide for Large Hospitals: Achieving 50% Energy Savings Toward a Net Zero Energy Building (AEDG-LH) ASHRAE et al. (2011b). The AEDG-LH is intended to provide recommendations for achieving 50% whole-building energy savings in large hospitals over levels achieved by following Standard 90.1-2004. The AEDG-LH was created for a 'standard' mid- to large-size hospital, typically at least 100,000 ft2, but the strategies apply to all sizes and classifications of new construction hospital buildings. Its primary focus is new construction, but recommendations may be applicable to facilities undergoing total renovation, and in part to many other hospital renovation, addition, remodeling, and modernization projects (including changes to one or more systems in existing buildings).
Imaginary Documentary: reflecting upon contemporary documental photography Documentário Imaginário: reflexões sobre a fotografia documental contemporânea

Directory of Open Access Journals (Sweden)

Kátia Hallak Lombardi

2008-01-01

Full Text Available This article pursues the idea of an Imaginary Documentary – a possible new inflexion on the practices of contemporary documental photography. The text establishes its theoretical foundations through a forthcoming approach of the discussions about documental photography to the concept of imaginary, by Gilbert Durand, and the notion of Imaginary Museum, by André Malraux. Photographers that are part of documental photography history are the elected objects in which we shall confront the potentialities of the Imaginary Documentary. Este artigo tem como propósito buscar a estruturação da idéia de Documentário Imaginário – uma possível inflexão na prática da fotografia documental contemporânea. O texto assenta suas bases teóricas por meio da aproximação de reflexões sobre a fotografia documental ao conceito de imaginário em Gilbert Durand e à noção de Museu Imaginário de André Malraux. Fotógrafos que fazem parte da história da fotografia documental são os objetos eleitos para aferir as potencialidades do Documentário Imaginário.
Investigation into Text Classification With Kernel Based Schemes

Science.gov (United States)

2010-03-01

Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the
Utah Text Retrieval Project

Energy Technology Data Exchange (ETDEWEB)

Hollaar, L A

1983-10-01

The Utah Text Retrieval project seeks well-engineered solutions to the implementation of large, inexpensive, rapid text information retrieval systems. The project has three major components. Perhaps the best known is the work on the specialized processors, particularly search engines, necessary to achieve the desired performance and cost. The other two concern the user interface to the system and the system's internal structure. The work on user interface development is not only concentrating on the syntax and semantics of the query language, but also on the overall environment the system presents to the user. Environmental enhancements include convenient ways to browse through retrieved documents, access to other information retrieval systems through gateways supporting a common command interface, and interfaces to word processing systems. The system's internal structure is based on a high-level data communications protocol linking the user interface, index processor, search processor, and other system modules. This allows them to be easily distributed in a multi- or specialized-processor configuration. It also allows new modules, such as a knowledge-based query reformulator, to be added. 15 references.

Modeling statistical properties of written text.

Directory of Open Access Journals (Sweden)

M Angeles Serrano

Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.
The BioLexicon: a large-scale terminological resource for biomedical text mining

Directory of Open Access Journals (Sweden)

Thompson Paul

2011-10-01

Full Text Available Abstract Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is
Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

Directory of Open Access Journals (Sweden)

Darko Brodić

2010-05-01

Full Text Available Text line segmentation is an essential stage in off-line optical character recognition (OCR systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.
Utilizing Multi-Field Text Features for Efficient Email Spam Filtering

Directory of Open Access Journals (Sweden)

Wuying Liu

2012-06-01

Full Text Available Large-scale spam emails cause a serious waste of time and resources. This paper investigates the text features of email documents and the feature noises among multi-field texts, resulting in an observation of a power law distribution of feature strings within each text field. According to the observation, we propose an efficient filtering approach including a compound weight method and a lightweight field text classification algorithm. The compound weight method considers both the historical classifying ability of each field classifier and the classifying contribution of each text field in the current classified email. The lightweight field text classification algorithm straightforwardly calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a string-frequency index for labeled emails storing. The string-frequency index structure has a random-sampling-based compressible property owing to the power law distribution and can largely reduce the storage space. The experimental results in the TREC spam track show that the proposed approach can complete the filtering task in low space cost and high speed, whose overall performance 1-ROCA exceeds the best one among the participators at the trec07p evaluation.
Using color management in color document processing

Science.gov (United States)

Nehab, Smadar

1995-04-01

Color Management Systems have been used for several years in Desktop Publishing (DTP) environments. While this development hasn't matured yet, we are already experiencing the next generation of the color imaging revolution-Device Independent Color for the small office/home office (SOHO) environment. Though there are still open technical issues with device independent color matching, they are not the focal point of this paper. This paper discusses two new and crucial aspects in using color management in color document processing: the management of color objects and their associated color rendering methods; a proposal for a precedence order and handshaking protocol among the various software components involved in color document processing. As color peripherals become affordable to the SOHO market, color management also becomes a prerequisite for common document authoring applications such as word processors. The first color management solutions were oriented towards DTP environments whose requirements were largely different. For example, DTP documents are image-centric, as opposed to SOHO documents that are text and charts centric. To achieve optimal reproduction on low-cost SOHO peripherals, it is critical that different color rendering methods are used for the different document object types. The first challenge in using color management of color document processing is the association of rendering methods with object types. As a result of an evolutionary process, color matching solutions are now available as application software, as driver embedded software and as operating system extensions. Consequently, document processing faces a new challenge, the correct selection of the color matching solution while avoiding duplicate color corrections.
Document image analysis: A primer

Indian Academy of Sciences (India)

R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

(1) Typical documents in today's office are computer-generated, but even so, inevitably by different computers and ... different sizes, from a business card to a large engineering drawing. Document analysis ... Whether global or adaptive ...
Multilingual access to full text databases

International Nuclear Information System (INIS)

Fluhr, C.; Radwan, K.

1990-05-01

Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs
Overview of Historical Earthquake Document Database in Japan and Future Development

Science.gov (United States)

Nishiyama, A.; Satake, K.

2014-12-01

In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami
Text mining for the biocuration workflow.

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.
Text mining for the biocuration workflow

Science.gov (United States)

Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

2012-01-01

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129
Segmentation of complex document

Directory of Open Access Journals (Sweden)

Souad Oudjemia

2014-06-01

Full Text Available In this paper we present a method for segmentation of documents image with complex structure. This technique based on GLCM (Grey Level Co-occurrence Matrix used to segment this type of document in three regions namely, 'graphics', 'background' and 'text'. Very briefly, this method is to divide the document image, in block size chosen after a series of tests and then applying the co-occurrence matrix to each block in order to extract five textural parameters which are energy, entropy, the sum entropy, difference entropy and standard deviation. These parameters are then used to classify the image into three regions using the k-means algorithm; the last step of segmentation is obtained by grouping connected pixels. Two performance measurements are performed for both graphics and text zones; we have obtained a classification rate of 98.3% and a Misclassification rate of 1.79%.
Document representations for classification of short web-page descriptions

Directory of Open Access Journals (Sweden)

Radovanović Miloš

2008-01-01

Full Text Available Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .
Text Skimming: The Process and Effectiveness of Foraging through Text under Time Pressure

Science.gov (United States)

Duggan, Geoffrey B.; Payne, Stephen J.

2009-01-01

Is Skim reading effective? How do readers allocate their attention selectively? The authors report 3 experiments that use expository texts and allow readers only enough time to read half of each document. Experiment 1 found that, relative to reading half the text, skimming improved memory for important ideas from a text but did not improve memory…
Large-scale extraction of gene interactions from full-text literature using DeepDive.

Science.gov (United States)

Mallory, Emily K; Zhang, Ce; Ré, Christopher; Altman, Russ B

2016-01-01

A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app russ.altman@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
The Feasibility of Using Large-Scale Text Mining to Detect Adverse Childhood Experiences in a VA-Treated Population.

Science.gov (United States)

Hammond, Kenric W; Ben-Ari, Alon Y; Laundry, Ryan J; Boyko, Edward J; Samore, Matthew H

2015-12-01

Free text in electronic health records resists large-scale analysis. Text records facts of interest not found in encoded data, and text mining enables their retrieval and quantification. The U.S. Department of Veterans Affairs (VA) clinical data repository affords an opportunity to apply text-mining methodology to study clinical questions in large populations. To assess the feasibility of text mining, investigation of the relationship between exposure to adverse childhood experiences (ACEs) and recorded diagnoses was conducted among all VA-treated Gulf war veterans, utilizing all progress notes recorded from 2000-2011. Text processing extracted ACE exposures recorded among 44.7 million clinical notes belonging to 243,973 veterans. The relationship of ACE exposure to adult illnesses was analyzed using logistic regression. Bias considerations were assessed. ACE score was strongly associated with suicide attempts and serious mental disorders (ORs = 1.84 to 1.97), and less so with behaviorally mediated and somatic conditions (ORs = 1.02 to 1.36) per unit. Bias adjustments did not remove persistent associations between ACE score and most illnesses. Text mining to detect ACE exposure in a large population was feasible. Analysis of the relationship between ACE score and adult health conditions yielded patterns of association consistent with prior research. Copyright © 2015 International Society for Traumatic Stress Studies.
The BioLexicon: a large-scale terminological resource for biomedical text mining

Science.gov (United States)

2011-01-01

Background Due to the rapidly expanding body of biomedical literature, biologists require increasingly sophisticated and efficient systems to help them to search for relevant information. Such systems should account for the multiple written variants used to represent biomedical concepts, and allow the user to search for specific pieces of knowledge (or events) involving these concepts, e.g., protein-protein interactions. Such functionality requires access to detailed information about words used in the biomedical literature. Existing databases and ontologies often have a specific focus and are oriented towards human use. Consequently, biological knowledge is dispersed amongst many resources, which often do not attempt to account for the large and frequently changing set of variants that appear in the literature. Additionally, such resources typically do not provide information about how terms relate to each other in texts to describe events. Results This article provides an overview of the design, construction and evaluation of a large-scale lexical and conceptual resource for the biomedical domain, the BioLexicon. The resource can be exploited by text mining tools at several levels, e.g., part-of-speech tagging, recognition of biomedical entities, and the extraction of events in which they are involved. As such, the BioLexicon must account for real usage of words in biomedical texts. In particular, the BioLexicon gathers together different types of terms from several existing data resources into a single, unified repository, and augments them with new term variants automatically extracted from biomedical literature. Extraction of events is facilitated through the inclusion of biologically pertinent verbs (around which events are typically organized) together with information about typical patterns of grammatical and semantic behaviour, which are acquired from domain-specific texts. In order to foster interoperability, the BioLexicon is modelled using the Lexical
Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

KAUST Repository

Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B.

2013-01-01

implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.Methodology:We developed a novel text mining methodology based on a new concept of position weight matrices
Indian Language Document Analysis and Understanding

Indian Academy of Sciences (India)

documents would contain text of more than one script (for example, English, Hindi and the ... O'Gorman and Govindaraju provides a good overview on document image ... word level in bilingual documents containing Roman and Tamil scripts.
On the use of the singular value decomposition for text retrieval

Energy Technology Data Exchange (ETDEWEB)

Husbands, P.; Simon, H.D.; Ding, C.

2000-12-04

The use of the Singular Value Decomposition (SVD) has been proposed for text retrieval in several recent works. This technique uses the SVD to project very high dimensional document and query vectors into a low dimensional space. In this new space it is hoped that the underlying structure of the collection is revealed thus enhancing retrieval performance. Theoretical results have provided some evidence for this claim and to some extent experiments have confirmed this. However, these studies have mostly used small test collections and simplified document models. In this work we investigate the use of the SVD on large document collections. We show that, if interpreted as a mechanism for representing the terms of the collection, this technique alone is insufficient for dealing with the variability in term occurrence. Section 2 introduces the text retrieval concepts necessary for our work. A short description of our experimental architecture is presented in Section 3. Section 4 describes how term occurrence variability affects the SVD and then shows how the decomposition influences retrieval performance. A possible way of improving SVD-based techniques is presented in Section 5 and concluded in Section 6.
Large mandibular central odontogenic fibroma documented over 20 years: A case report

Directory of Open Access Journals (Sweden)

Patrick Bandura

Full Text Available Introduction: Central odontogenic fibroma (COF is a rare, benign, slow-growing intraosseous odontogenic tumor, and accounts for 0.1% of all odontogenic tumors. It is often confused with other entities, such as keratocysts, ameloblastomas, and odontogenic myxomas. Complete enucleation followed by curettage is the treatment of choice for COF to ensure the lowest possible chance of recurrence. Case presentation: We report the case of a young Caucasian woman with COF that went undiagnosed for several years despite repeated radiologic examinations. Finally, a massive tumor was surgically removed and the wound was curetted. The specimen was histologically confirmed to be a COF. The patient remains under regular follow-up, and thus far there have been no clinical or radiologic signs of recurrence. Discussion: This rare case of COF, which was documented over a period of 20 years, has helped us to describe the features of this tumor. It also confirms that adequate surgical treatment can lead to impressive bone regeneration in healthy individuals, as evident from the radiologic findings acquired before, during, and after enucleation of the COF in our patient. Our findings also confirm the view that COF has a favorable prognosis regardless of its final size. Conclusion: Early diagnosis is key to successful treatment of COF. The slow but steady increase in the size of a COF with no accompanying symptoms has not been reported previously. To our knowledge, this is the only documented case of a COF that has been under continuous radiologic observation for over 20 years. Keywords: Case report, Central odontogenic fibroma, Long-term, Bone deformation, Follow-up, Tumor enucleation

Menzerath-Altmann law for distinct word distribution analysis in a large text

Science.gov (United States)

Eroglu, Sertac

2013-06-01

The empirical law uncovered by Menzerath and formulated by Altmann, known as the Menzerath-Altmann law (henceforth the MA law), reveals the statistical distribution behavior of human language in various organizational levels. Building on previous studies relating organizational regularities in a language, we propose that the distribution of distinct (or different) words in a large text can effectively be described by the MA law. The validity of the proposition is demonstrated by examining two text corpora written in different languages not belonging to the same language family (English and Turkish). The results show not only that distinct word distribution behavior can accurately be predicted by the MA law, but that this result appears to be language-independent. This result is important not only for quantitative linguistic studies, but also may have significance for other naturally occurring organizations that display analogous organizational behavior. We also deliberately demonstrate that the MA law is a special case of the probability function of the generalized gamma distribution.
Nuclear power plants documentation system

International Nuclear Information System (INIS)

Schwartz, E.L.

1991-01-01

Since the amount of documents (type and quantity) necessary for the entire design of a NPP is very large, this implies that an overall and detailed identification, filling and retrieval system shall be implemented. This is even more applicable to the FINAL QUALITY DOCUMENTATION of the plant, as stipulated by IAEA Safety Codes and related guides. For such a purpose it was developed a DOCUMENTATION MANUAL, which describes in detail the before mentioned documentation system. Here we present the expected goals and results which we have to reach for Angra 2 and 3 Project. (author)
Computer-Assisted Search Of Large Textual Data Bases

Science.gov (United States)

Driscoll, James R.

1995-01-01

"QA" denotes high-speed computer system for searching diverse collections of documents including (but not limited to) technical reference manuals, legal documents, medical documents, news releases, and patents. Incorporates previously available and emerging information-retrieval technology to help user intelligently and rapidly locate information found in large textual data bases. Technology includes provision for inquiries in natural language; statistical ranking of retrieved information; artificial-intelligence implementation of semantics, in which "surface level" knowledge found in text used to improve ranking of retrieved information; and relevance feedback, in which user's judgements of relevance of some retrieved documents used automatically to modify search for further information.
Storing XML Documents in Databases

NARCIS (Netherlands)

A.R. Schmidt; S. Manegold (Stefan); M.L. Kersten (Martin); L.C. Rivero; J.H. Doorn; V.E. Ferraggine

2005-01-01

textabstractThe authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML
The Pelindaba text and its previous

International Nuclear Information System (INIS)

Adeniji, O.

1996-01-01

The main body of the Treaty, the preamble, articles 1-22, and the map are reproduced in this issue in the section ''Documentation Relating to Disarmament and International Security''. The complete text, including annexes and protocols, is contained in document A/50/426
GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

Science.gov (United States)

Srinivasa, K. G.; Shree Devi, B. N.

2017-10-01

String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU's for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document's length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard Lorem Ipsum text on NVIDIA's GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially.
Testing System Encryption-Decryption Method to RSA Security Documents

International Nuclear Information System (INIS)

Supriyono

2008-01-01

A model of document protection which was tested as one of the instruments, especially text document. The principle of the document protection was how the system was able to protect the document storage and transfer processes. Firstly, the text-formed document was encrypted; therefore, the document cannot be read for the text was transformed into random letters. The letter-randomized text was then unfolded by the description in order that the document owner was able to read it. In the recent research, the method adopted was RSA method, in which it used complicated mathematics calculation and equipped with initial protection key (with either private key or public key), thus, it was more difficult to be attacked by hackers. The system was developed by using the software of Borland Delphi 7. The results indicated that the system was capable to save and transfer the document, both via internet and intranet in the form of encrypted letter and put it back to the initial form of document by way of description. The research also tested for encrypted and decrypted process for various memory size documents. (author)
Securing XML Documents

Directory of Open Access Journals (Sweden)

Charles Shoniregun

2004-11-01

Full Text Available XML (extensible markup language is becoming the current standard for establishing interoperability on the Web. XML data are self-descriptive and syntax-extensible; this makes it very suitable for representation and exchange of semi-structured data, and allows users to define new elements for their specific applications. As a result, the number of documents incorporating this standard is continuously increasing over the Web. The processing of XML documents may require a traversal of all document structure and therefore, the cost could be very high. A strong demand for a means of efficient and effective XML processing has posed a new challenge for the database world. This paper discusses a fast and efficient indexing technique for XML documents, and introduces the XML graph numbering scheme. It can be used for indexing and securing graph structure of XML documents. This technique provides an efficient method to speed up XML data processing. Furthermore, the paper explores the classification of existing methods impact of query processing, and indexing.
A document processing pipeline for annotating chemical entities in scientific documents.

Science.gov (United States)

Campos, David; Matos, Sérgio; Oliveira, José L

2015-01-01

The recognition of drugs and chemical entities in text is a very important task within the field of biomedical information extraction, given the rapid growth in the amount of published texts (scientific papers, patents, patient records) and the relevance of these and other related concepts. If done effectively, this could allow exploiting such textual resources to automatically extract or infer relevant information, such as drug profiles, relations and similarities between drugs, or associations between drugs and potential drug targets. The objective of this work was to develop and validate a document processing and information extraction pipeline for the identification of chemical entity mentions in text. We used the BioCreative IV CHEMDNER task data to train and evaluate a machine-learning based entity recognition system. Using a combination of two conditional random field models, a selected set of features, and a post-processing stage, we achieved F-measure results of 87.48% in the chemical entity mention recognition task and 87.75% in the chemical document indexing task. We present a machine learning-based solution for automatic recognition of chemical and drug names in scientific documents. The proposed approach applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Post-processing modules are also integrated, performing parentheses correction, abbreviation resolution and filtering erroneous mentions using an exclusion list derived from the training data. The developed methods were implemented as a document annotation tool and web service, freely available at http://bioinformatics.ua.pt/becas-chemicals/.
Improving collaborative documentation in CMS

International Nuclear Information System (INIS)

Lassila-Perini, Kati; Salmi, Leena

2010-01-01

Complete and up-to-date documentation is essential for efficient data analysis in a large and complex collaboration like CMS. Good documentation reduces the time spent in problem solving for users and software developers. The scientists in our research environment do not necessarily have the interests or skills of professional technical writers. This results in inconsistencies in the documentation. To improve the quality, we have started a multidisciplinary project involving CMS user support and expertise in technical communication from the University of Turku, Finland. In this paper, we present possible approaches to study the usability of the documentation, for instance, usability tests conducted recently for the CMS software and computing user documentation.
Document Type Profiles in Nature, Science, and PNAS: Journal and Country Level

Directory of Open Access Journals (Sweden)

Jielan Ding

2016-09-01

Full Text Available Purpose: In this contribution, we want to detect the document type profiles of the three prestigious journals Nature, Science, and Proceedings of the National Academy of Sciences of the United States (PNAS with regard to two levels: journal and country. Design/methodology/approach: Using relative values based on fractional counting, we investigate the distribution of publications across document types at both the journal and country level, and we use (cosine document type profile similarity values to compare pairs of publication years within countries. Findings: Nature and Science mainly publish Editorial Material, Article, News Item and Letter, whereas the publications of PNAS are heavily concentrated on Article. The shares of Article for Nature and Science are decreasing slightly from 1999 to 2014, while the corresponding shares of Editorial Material are increasing. Most studied countries focus on Article and Letter in Nature, but on Letter in Science and PNAS. The document type profiles of some of the studied countries change to a relatively large extent over publication years. Research limitations: The main limitation of this research concerns the Web of Science classification of publications into document types. Since the analysis of the paper is based on document types of Web of Science, the classification in question is not free from errors, and the accuracy of the analysis might be affected. Practical implications: Results show that Nature and Science are quite diversified with regard to document types. In bibliometric assessments, where publications in Nature and Science play a role, other document types than Article and Review might therefore be taken into account. Originality/value: Results highlight the importance of other document types than Article and Review in Nature and Science. Large differences are also found when comparing the country document type profiles of the three journals with the corresponding profiles in all Web of
Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.

Science.gov (United States)

Chen, Z.

1993-01-01

Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)
Benchmarking infrastructure for mutation text mining.

Science.gov (United States)

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Benchmarking infrastructure for mutation text mining

Science.gov (United States)

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Cultural diversity: blind spot in medical curriculum documents, a document analysis.

Science.gov (United States)

Paternotte, Emma; Fokkema, Joanne P I; van Loon, Karsten A; van Dulmen, Sandra; Scheele, Fedde

2014-08-22

Cultural diversity among patients presents specific challenges to physicians. Therefore, cultural diversity training is needed in medical education. In cases where strategic curriculum documents form the basis of medical training it is expected that the topic of cultural diversity is included in these documents, especially if these have been recently updated. The aim of this study was to assess the current formal status of cultural diversity training in the Netherlands, which is a multi-ethnic country with recently updated medical curriculum documents. In February and March 2013, a document analysis was performed of strategic curriculum documents for undergraduate and postgraduate medical education in the Netherlands. All text phrases that referred to cultural diversity were extracted from these documents. Subsequently, these phrases were sorted into objectives, training methods or evaluation tools to assess how they contributed to adequate curriculum design. Of a total of 52 documents, 33 documents contained phrases with information about cultural diversity training. Cultural diversity aspects were more prominently described in the curriculum documents for undergraduate education than in those for postgraduate education. The most specific information about cultural diversity was found in the blueprint for undergraduate medical education. In the postgraduate curriculum documents, attention to cultural diversity differed among specialties and was mainly superficial. Cultural diversity is an underrepresented topic in the Dutch documents that form the basis for actual medical training, although the documents have been updated recently. Attention to the topic is thus unwarranted. This situation does not fit the demand of a multi-ethnic society for doctors with cultural diversity competences. Multi-ethnic countries should be critical on the content of the bases for their medical educational curricula.
Multi-font printed Mongolian document recognition system

Science.gov (United States)

Peng, Liangrui; Liu, Changsong; Ding, Xiaoqing; Wang, Hua; Jin, Jianming

2009-01-01

Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
Domain-independent information extraction in unstructured text

Energy Technology Data Exchange (ETDEWEB)

Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

1996-09-01

Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.
Are PDF Documents Accessible?

Directory of Open Access Journals (Sweden)

Mireia Ribera Turró

2008-09-01

Full Text Available Adobe PDF is one of the most widely used formats in scientific communications and in administrative documents. In its latest versions it has incorporated structural tags and improvements that increase its level of accessibility. This article reviews the concept of accessibility in the reading of digital documents and evaluates the accessibility of PDF according to the most widely established standards.
Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus

Directory of Open Access Journals (Sweden)

Nikhil Kalathil

2018-01-01

Full Text Available When assessing the importance of materials (or other components to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries.
A Chinese text classification system based on Naive Bayes algorithm

Directory of Open Access Journals (Sweden)

Cui Wei

2016-01-01

Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

Empirical Studies On Machine Learning Based Text Classification Algorithms

OpenAIRE

Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

2011-01-01

Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...
Triangular clustering in document networks

Energy Technology Data Exchange (ETDEWEB)

Cheng Xueqi; Ren Fuxin [Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190 (China); Zhou Shi [Department of Computer Science, University College London, Malet Place, London WC1E 6BT (United Kingdom); Hu Maobin [School of Engineering Science, University of Science and Technology of China, Hefei 230026 (China)], E-mail: cxq@ict.ac.cn, E-mail: renfuxin@software.ict.ac.cn, E-mail: s.zhou@adastral.ucl.ac.uk, E-mail: humaobin@ustc.edu.cn

2009-03-15

Document networks have the characteristic that a document node, e.g. a webpage or an article, carries meaningful content. Properties of document networks are not only affected by topological connectivity between nodes, but are also strongly influenced by the semantic relation between the content of the nodes. We observed that document networks have a large number of triangles and a high value clustering coefficient. Also there is a strong correlation between the probability of formation of a triangle and the content similarity among the three nodes involved. We propose the degree-similarity product (DSP) model, which well reproduces these properties. The model achieves this by using a preferential attachment mechanism that favours the linkage between nodes that are both popular and similar. This work is a step forward towards a better understanding of the structure and evolution of document networks.
Tank waste remediation system functions and requirements document

Energy Technology Data Exchange (ETDEWEB)

Carpenter, K.E

1996-10-03

This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle.
Tank waste remediation system functions and requirements document

International Nuclear Information System (INIS)

Carpenter, K.E

1996-01-01

This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle
Applications for electronic documents

International Nuclear Information System (INIS)

Beitel, G.A.

1995-01-01

This paper discusses the application of electronic media to documents, specifically Safety Analysis Reports (SARs), prepared for Environmental Restoration and Waste Management (ER ampersand WM) programs being conducted for the Department of Energy (DOE) at the Idaho National Engineering Laboratory (INEL). Efforts are underway to upgrade our document system using electronic format. To satisfy external requirements (DOE, State, and Federal), ER ampersand WM programs generate a complement of internal requirements documents including a SAR and Technical Safety Requirements along with procedures and training materials. Of interest, is the volume of information and the difficulty in handling it. A recently prepared ER ampersand WM SAR consists of 1,000 pages of text and graphics; supporting references add 10,000 pages. Other programmatic requirements documents consist of an estimated 5,000 pages plus references
Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

Directory of Open Access Journals (Sweden)

Emilio Granell

2018-01-01

Full Text Available The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs. They show that sub-lexical units outperform word units in terms of Word Error Rate (WER, Character Error Rate (CER and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs and Convolutional Recurrent Neural Nets (CRNNs. Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.
SEMANTIC METADATA FOR HETEROGENEOUS SPATIAL PLANNING DOCUMENTS

Directory of Open Access Journals (Sweden)

A. Iwaniak

2016-09-01

Full Text Available Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa. The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Document reconstruction by layout analysis of snippets

Science.gov (United States)

Kleber, Florian; Diem, Markus; Sablatnig, Robert

2010-02-01

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.
Using ontology network structure in text mining.

Science.gov (United States)

Berndt, Donald J; McCart, James A; Luther, Stephen L

2010-11-13

Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.
Knowledge based word-concept model estimation and refinement for biomedical text mining.

Science.gov (United States)

Jimeno Yepes, Antonio; Berlanga, Rafael

2015-02-01

Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches. Copyright © 2014 Elsevier Inc. All rights reserved.
Using Bitmap Indexing Technology for Combined Numerical and TextQueries

Energy Technology Data Exchange (ETDEWEB)

Stockinger, Kurt; Cieslewicz, John; Wu, Kesheng; Rotem, Doron; Shoshani, Arie

2006-10-16

In this paper, we describe a strategy of using compressedbitmap indices to speed up queries on both numerical data and textdocuments. By using an efficient compression algorithm, these compressedbitmap indices are compact even for indices with millions of distinctterms. Moreover, bitmap indices can be used very efficiently to answerBoolean queries over text documents involving multiple query terms.Existing inverted indices for text searches are usually inefficient forcorpora with a very large number of terms as well as for queriesinvolving a large number of hits. We demonstrate that our compressedbitmap index technology overcomes both of those short-comings. In aperformance comparison against a commonly used database system, ourindices answer queries 30 times faster on average. To provide full SQLsupport, we integrated our indexing software, called FastBit, withMonetDB. The integrated system MonetDB/FastBit provides not onlyefficient searches on a single table as FastBit does, but also answersjoin queries efficiently. Furthermore, MonetDB/FastBit also provides avery efficient retrieval mechanism of result records.
Probing the topological properties of complex networks modeling short written texts.

Directory of Open Access Journals (Sweden)

Diego R Amancio

Full Text Available In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well-many informative discoveries have been made this way-but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.
Relating interesting quantitative time series patterns with text events and text features

Science.gov (United States)

Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.

2013-12-01

In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other
Growing electronic documents created by researchers

Directory of Open Access Journals (Sweden)

Monika Weiss

2017-05-01

Full Text Available In the contemporary world technology is an indispensable element, both in personal and professional sphere. Despite the fact, that we do not attach significance to it in our everyday lives, the technological development engulfed us and still reminds us about that. In the face of dynamically growing digitization there occurred a new form of document – an electronic document. The study concerns the growing electronic documentation among researchers working at the Nicolaus Copernicus University in Toruń. The analysis of surveys and interviews resulted in thesis, that researchers use e-document more frequently than analog documentation. Flexibility and accessibility of this type of documents become a problem in personal papers which will be archived in the future – maybe in most part in the form of electronic documentation.
Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Directory of Open Access Journals (Sweden)

M.C. Padma

2008-06-01

Full Text Available In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.
Text-mining analysis of mHealth research

Science.gov (United States)

Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical
Text-mining analysis of mHealth research.

Science.gov (United States)

Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

2017-01-01

In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions
FTP: Full-Text Publishing?

Science.gov (United States)

Jul, Erik

1992-01-01

Describes the use of file transfer protocol (FTP) on the INTERNET computer network and considers its use as an electronic publishing system. The differing electronic formats of text files are discussed; the preparation and access of documents are described; and problems are addressed, including a lack of consistency. (LRW)
Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.

Science.gov (United States)

Xu, Rong; Wang, QuanQiu

2015-06-01

Targeted anticancer drugs such as imatinib, trastuzumab and erlotinib dramatically improved treatment outcomes in cancer patients, however, these innovative agents are often associated with unexpected side effects. The pathophysiological mechanisms underlying these side effects are not well understood. The availability of a comprehensive knowledge base of side effects associated with targeted anticancer drugs has the potential to illuminate complex pathways underlying toxicities induced by these innovative drugs. While side effect association knowledge for targeted drugs exists in multiple heterogeneous data sources, published full-text oncological articles represent an important source of pivotal, investigational, and even failed trials in a variety of patient populations. In this study, we present an automatic process to extract targeted anticancer drug-associated side effects (drug-SE pairs) from a large number of high profile full-text oncological articles. We downloaded 13,855 full-text articles from the Journal of Oncology (JCO) published between 1983 and 2013. We developed text classification, relationship extraction, signaling filtering, and signal prioritization algorithms to extract drug-SE pairs from downloaded articles. We extracted a total of 26,264 drug-SE pairs with an average precision of 0.405, a recall of 0.899, and an F1 score of 0.465. We show that side effect knowledge from JCO articles is largely complementary to that from the US Food and Drug Administration (FDA) drug labels. Through integrative correlation analysis, we show that targeted drug-associated side effects positively correlate with their gene targets and disease indications. In conclusion, this unique database that we built from a large number of high-profile oncological articles could facilitate the development of computational models to understand toxic effects associated with targeted anticancer drugs. Copyright © 2015 Elsevier Inc. All rights reserved.
Document organization by means of graphs

Directory of Open Access Journals (Sweden)

Santa Vallejo Figueroa

2016-12-01

Full Text Available Nowadays documents are the main way to represent information and knowledge in several domains. Continuously users store documents in hard disk or online media according to some personal organization based on topics, but such documents can contain one or more topics. This situation makes hard to access documents when is required. The current search engines are based on the name of file or content, but where the desired term or terms must match exactly as are in the content. In this paper, a method for organize documents by means of graphs is proposed, taking into account the topics of the documents. For this a graph for each document is generated taking into account synonyms, semantic related terms, hyponyms, and hypernyms of nouns and verbs contained in documents. The proposal have been compares against Google Desktop and LogicalDoc with interesting results.

A Comparative Analysis of Information Hiding Techniques for Copyright Protection of Text Documents

Directory of Open Access Journals (Sweden)

Milad Taleby Ahvanooey

2018-01-01

Full Text Available With the ceaseless usage of web and other online services, it has turned out that copying, sharing, and transmitting digital media over the Internet are amazingly simple. Since the text is one of the main available data sources and most widely used digital media on the Internet, the significant part of websites, books, articles, daily papers, and so on is just the plain text. Therefore, copyrights protection of plain texts is still a remaining issue that must be improved in order to provide proof of ownership and obtain the desired accuracy. During the last decade, digital watermarking and steganography techniques have been used as alternatives to prevent tampering, distortion, and media forgery and also to protect both copyright and authentication. This paper presents a comparative analysis of information hiding techniques, especially on those ones which are focused on modifying the structure and content of digital texts. Herein, various text watermarking and text steganography techniques characteristics are highlighted along with their applications. In addition, various types of attacks are described and their effects are analyzed in order to highlight the advantages and weaknesses of current techniques. Finally, some guidelines and directions are suggested for future works.
Areva - 2011 Reference document; Areva - Document de reference 2011

Energy Technology Data Exchange (ETDEWEB)

NONE

2011-07-01

After having indicated the person responsible of this document and the legal account auditors, and provided some financial information, this document gives an overview of the different risk factors existing in the company: law risks, industrial and environmental risks, operational risks, risks related to large projects, market and liquidity risks. Then, after having recalled the history and evolution of the company and the evolution of its investments over the last five years, it proposes an overview of Areva's activities on the markets of nuclear energy and renewable energies, of its clients and suppliers, of its strategy, of the activities of its different departments. Other information are provided: company's flow chart, estate properties (plants, equipment), an analysis of its financial situation, its research and development policy, the present context, profit previsions or estimations, management organization and operation
Robust keyword retrieval method for OCRed text

Science.gov (United States)

Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu

2011-01-01

Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.
Changing Landscapes in Documentation Efforts : Civil Society Documentation of Serious Human Rights Violations

NARCIS (Netherlands)

Mc Gonigle, B.N.

2017-01-01

Wittingly or unwittingly, civil society actors have long been faced with the task of documenting serious human rights violations. Thirty years ago, such efforts were largely organised by grassroots movements, often with little support or funding from international actors. Sharing information and
Storing XML Documents in Databases

OpenAIRE

Schmidt, A.R.; Manegold, Stefan; Kersten, Martin; Rivero, L.C.; Doorn, J.H.; Ferraggine, V.E.

2005-01-01

textabstractThe authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes ...
Electronic Document Management Using Inverted Files System

Directory of Open Access Journals (Sweden)

Suhartono Derwin

2014-03-01

Full Text Available The amount of documents increases so fast. Those documents exist not only in a paper based but also in an electronic based. It can be seen from the data sample taken by the SpringerLink publisher in 2010, which showed an increase in the number of digital document collections from 2003 to mid of 2010. Then, how to manage them well becomes an important need. This paper describes a new method in managing documents called as inverted files system. Related with the electronic based document, the inverted files system will closely used in term of its usage to document so that it can be searched over the Internet using the Search Engine. It can improve document search mechanism and document save mechanism.
A Hybrid Feature Selection Approach for Arabic Documents Classification

NARCIS (Netherlands)

Habib, Mena Badieh; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.; Fayed, Zaki T.; Gharib, Tarek F.

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to
The Texts of the Agency's Relationship Agreements with Specialized Agencies; Textes des Accords Conclus Entre l'Agence et des Institutions Specialisees

Energy Technology Data Exchange (ETDEWEB)

NONE

1960-09-27

The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [French] Le present document reproduit les textes des accords que l'Agence a conclus avec les institutions specialisees enumerees ci-apres, ainsi que ceux des protocoles validant lesdits accords. Ces textes sont presentes, pour information, a tous les Membres de l'Agence dans l'ordre chronologique d'entree en vigueur desdits accords.
NOSQL FOR STORAGE AND RETRIEVAL OF LARGE LIDAR DATA COLLECTIONS

Directory of Open Access Journals (Sweden)

J. Boehm

2015-08-01

Full Text Available Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
SparkText: Biomedical Text Mining on Big Data Framework.

Science.gov (United States)

Ye, Zhan; Tafti, Ahmad P; He, Karen Y; Wang, Kai; He, Max M

Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
Academic Journal Embargoes and Full Text Databases.

Science.gov (United States)

Brooks, Sam

2003-01-01

Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…
CNEA's quality system documentation

International Nuclear Information System (INIS)

Mazzini, M.M.; Garonis, O.H.

1998-01-01

Full text: To obtain an effective and coherent documentation system suitable for CNEA's Quality Management Program, we decided to organize the CNEA's quality documentation with : a- Level 1. Quality manual. b- Level 2. Procedures. c-Level 3. Qualities plans. d- Level 4: Instructions. e- Level 5. Records and other documents. The objective of this work is to present a standardization of the documentation of the CNEA's quality system of facilities, laboratories, services, and R and D activities. Considering the diversity of criteria and formats for elaboration the documentation by different departments, and since ultimately each of them generally includes the same quality management policy, we proposed the elaboration of a system in order to improve the documentation, avoiding unnecessary time wasting and costs. This will aloud each sector to focus on their specific documentation. The quality manuals of the atomic centers fulfill the rule 3.6.1 of the Nuclear Regulatory Authority, and the Safety Series 50-C/SG-Q of the International Atomic Energy Agency. They are designed by groups of competent and highly trained people of different departments. The normative procedures are elaborated with the same methodology as the quality manuals. The quality plans which describe the organizational structure of working group and the appropriate documentation, will asses the quality manuals of facilities, laboratories, services, and research and development activities of atomic centers. The responsibilities for approval of the normative documentation are assigned to the management in charge of the administration of economic and human resources in order to fulfill the institutional objectives. Another improvement aimed to eliminate unnecessary invaluable processes is the inclusion of all quality system's normative documentation in the CNEA intranet. (author) [es
SparkText: Biomedical Text Mining on Big Data Framework.

Directory of Open Access Journals (Sweden)

Zhan Ye

Full Text Available Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment.In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM, and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes.This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research.
Representation of Social History Factors Across Age Groups: A Topic Analysis of Free-Text Social Documentation.

Science.gov (United States)

Lindemann, Elizabeth A; Chen, Elizabeth S; Wang, Yan; Skube, Steven J; Melton, Genevieve B

2017-01-01

As individuals age, there is potential for dramatic changes in the social and behavioral determinants that affect health status and outcomes. The importance of these determinants has been increasingly recognized in clinical decision-making. We sought to characterize how social and behavioral health determinants vary in different demographic groups using a previously established schema of 28 social history types through both manual analysis and automated topic analysis of social documentation in the electronic health record across the population of an entire integrated healthcare system. Our manual analysis generated 8,335 annotations over 1,400 documents, representing 24 (86%) social history types. In contrast, automated topic analysis generated 22 (79%) social history types. A comparative evaluation demonstrated both similarities and differences in coverage between the manual and topic analyses. Our findings validate the widespread nature of social and behavioral determinants that affect health status over populations of individuals over their lifespan.
An Intelligent System For Arabic Text Categorization

NARCIS (Netherlands)

Syiam, M.M.; Tolba, Mohamed F.; Fayed, Z.T.; Abdel-Wahab, Mohamed S.; Ghoniemy, Said A.; Habib, Mena Badieh

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and
Documentation of torture victims, assessment of the start procedure for medico-legal documentation.

Science.gov (United States)

Mandel, Lene; Worm, Lise

2007-01-01

A Pilot Study was performed at the Rehabilitation and Research Centre for Torture Victims (RCT) in Copenhagen in order to explore the possibilities for adding a medico-legal documentation component to the rehabilitation of torture victims already taking place. It describes the process and results on implementing medico-legal documentation in a rehabilitative setting. A modified version of the Guidelines in the Istanbul Protocol was developed on the basis of the review of literature and current practices described in "Documentation of torture victims, implementation of medico-legal protocols". The modified guidelines were tested on five clients. The aim was twofold: 1) To assess the client's attitude towards the idea of adding a documentation component to the rehabilitation process and: 2) To assess the practical circumstances of implementing the Istanbul Protocol in the everyday life of a rehabilitation centre. Results show that all five clients were positive towards the project and found comfort in being able to contribute to the fight against impunity. Also, the Pilot Study demonstrated that a large part of the medico-legal documentation was already obtained in the rehabilitation process. It was however not accessible due to lack of systematization and a data registering system. There are thus important synergies in collecting data for rehabilitation and documentation but a joint database system is necessary to realize these synergies.
Informational system. Documents management

Directory of Open Access Journals (Sweden)

Vladut Iacob

2009-12-01

Full Text Available Productivity growing, as well as reducing of operational costs in a company can be achieved by adopting a document management solutions. Such application will allow management and structured and efficient transmission of information within the organization.
Integration of the ecosystem services concept in planning documents from six municipalities in southwestern Sweden

Directory of Open Access Journals (Sweden)

Amanda C. Nordin

2017-09-01

Full Text Available The ecosystem services (ES concept refers to benefits that humanity receives from nature. Investigating how this concept has been embraced within urban planning is important when assessing the awareness of human dependence on natural functions and the potential for the ES concept to increase this awareness. We analyzed planning documents from three small and three large municipalities in southern Sweden to see how explicitly the ES concept was addressed and which individual services were mentioned. We found that five of the municipalities mentioned the ES concept explicitly and the remaining municipality addressed it implicitly. Comprehensive and green plans referred to the ES concept more explicitly than did plans that focused on a single issue. We used 23 individual ES as a reference; each was mentioned in at least one document, but those concerning habitat and recreation were mentioned most frequently. Individual ES were generally described at an elaborate level. No major differences were identified between large and small municipalities except that large ones mentioned more individual ES. Our study demonstrates that municipalities in southern Sweden have started to integrate the ES concept into their planning documents. However, there is great potential to increase and concretize the awareness of ES.
Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique

Directory of Open Access Journals (Sweden)

Barate Seema

2016-01-01

Full Text Available Images contain various types of useful information that should be extracted whenever required. A various algorithms and methods are proposed to extract text from the given image, and by using that user will be able to access the text from any image. Variations in text may occur because of differences in size, style,orientation, alignment of text, and low image contrast, composite backgrounds make the problem during extraction of text. If we develop an application that extracts and recognizes those texts accurately in real time, then it can be applied to many important applications like document analysis, vehicle license plate extraction, text- based image indexing, etc and many applications have become realities in recent years. To overcome the above problems we develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model, blob analysis,template matching, template generation.
Data mining of text as a tool in authorship attribution

Science.gov (United States)

Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu

2001-03-01

It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.

The Temptation of Documentation: Potential and Challenges of Videographic Documentation and Interpretation

Directory of Open Access Journals (Sweden)

Marie Winckler

2014-02-01

Full Text Available Insights into the civic education classroom can be gained through videographic documentation. Videographic material offers, as I argue in this article, great possibilities: Through a reconstructive approach insights into dimensions of civic education such as spatial organisation, symbolic representation and non-verbal communication may emerge. In this way, a deeper understanding of informal political learning in school can be obtained. These aspects have not yet been considered in depth with videographic documentation primarily employed to date in teacher training contexts and lesson evaluation. The case study I present here was inspired by the documentary method and both the potential and limitations of videographic interpretation are discussed in this context. The study also suggests that what is not offered by videographic documentation includes insights into the individual and collective integration of experiences in civic education lessons.
Text Generation: The State of the Art and the Literature.

Science.gov (United States)

Mann, William C.; And Others

This report comprises two documents which describe the state of the art of computer generation of natural language text. Both were prepared by a panel of individuals who are active in research on text generation. The first document assesses the techniques now available for use in systems design, covering all of the technical methods by which…
Indexation automatique des textes arabes : état de l’art

Directory of Open Access Journals (Sweden)

Mohamed Salim El Bazzi

2016-11-01

Full Text Available Document indexing is a crucial step in the text mining process. It is used to represent documents by the most relevant descriptors of their contents. Several approaches are proposed in the literature, particularly for English, but they are unusable for Arabic documents, considering its specific characteristics and its morphological complexity, grammar and vocabulary. In this paper, we present a reading in the state of the art of indexation methods and their contribution to improve Arabic document’s processing. We also propose a categorization of works according to the most used approaches and methods for indexing textual documents. We adopted a qualitative selection of papers and we retained papers approving notable indexation contributions and illustrating significant results
Assessing semantic similarity of texts - Methods and algorithms

Science.gov (United States)

Rozeva, Anna; Zerkova, Silvia

2017-12-01

Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
SparkText: Biomedical Text Mining on Big Data Framework

Science.gov (United States)

He, Karen Y.; Wang, Kai

2016-01-01

Background Many new biomedical research articles are published every day, accumulating rich information, such as genetic variants, genes, diseases, and treatments. Rapid yet accurate text mining on large-scale scientific literature can discover novel knowledge to better understand human diseases and to improve the quality of disease diagnosis, prevention, and treatment. Results In this study, we designed and developed an efficient text mining framework called SparkText on a Big Data infrastructure, which is composed of Apache Spark data streaming and machine learning methods, combined with a Cassandra NoSQL database. To demonstrate its performance for classifying cancer types, we extracted information (e.g., breast, prostate, and lung cancers) from tens of thousands of articles downloaded from PubMed, and then employed Naïve Bayes, Support Vector Machine (SVM), and Logistic Regression to build prediction models to mine the articles. The accuracy of predicting a cancer type by SVM using the 29,437 full-text articles was 93.81%. While competing text-mining tools took more than 11 hours, SparkText mined the dataset in approximately 6 minutes. Conclusions This study demonstrates the potential for mining large-scale scientific articles on a Big Data infrastructure, with real-time update from new articles published daily. SparkText can be extended to other areas of biomedical research. PMID:27685652
Semantic Metadata for Heterogeneous Spatial Planning Documents

Science.gov (United States)

Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

2016-09-01

Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Inferring Group Processes from Computer-Mediated Affective Text Analysis

Energy Technology Data Exchange (ETDEWEB)

Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University

2011-02-01

Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.
Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.

Science.gov (United States)

Bravo, Àlex; Piñero, Janet; Queralt-Rosinach, Núria; Rautschka, Michael; Furlong, Laura I

2015-02-21

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a
Securing Document Warehouses against Brute Force Query Attacks

Directory of Open Access Journals (Sweden)

Sergey Vladimirovich Zapechnikov

2017-04-01

Full Text Available The paper presents the scheme of data management and protocols for securing document collection against adversary users who try to abuse their access rights to find out the full content of confidential documents. The configuration of secure document retrieval system is described and a suite of protocols among the clients, warehouse server, audit server and database management server is specified. The scheme makes it infeasible for clients to establish correspondence between the documents relevant to different search queries until a moderator won’t give access to these documents. The proposed solution allows ensuring higher security level for document warehouses.
IR and OLAP in XML document warehouses

DEFF Research Database (Denmark)

Perez, Juan Manuel; Pedersen, Torben Bach; Berlanga, Rafael

2005-01-01

In this paper we propose to combine IR and OLAP (On-Line Analytical Processing) technologies to exploit a warehouse of text-rich XML documents. In the system we plan to develop, a multidimensional implementation of a relevance modeling document model will be used for interactively querying...
Text accessibility by people with reduced contrast sensitivity.

Science.gov (United States)

Crossland, Michael D; Rubin, Gary S

2012-09-01

Contrast sensitivity is reduced in people with eye disease, and also in older adults without eye disease. In this article, we compare contrast of text presented in print and digital formats with contrast sensitivity values for a large cohort of subjects in a population-based study of older adults (the Salisbury Eye Evaluation). Contrast sensitivity values were recorded for 2520 adults aged 65 to 84 years living in Salisbury, Maryland. The proportion of the sample likely to be unable to read text of different formats (electronic books, newsprint, paperback books, laser print, and LED computer monitors) was calculated using published contrast reserve levels required to perform spot reading, to read with fluency, high fluency, and under optimal conditions. One percent of this sample had contrast sensitivity less than that required to read newsprint fluently. Text presented on an LED computer monitor had the highest contrast. Ninety-eight percent of the sample had contrast sensitivity sufficient for high fluent reading of text (at least 160 words/min) on a monitor. However, 29.6% were still unlikely to be able to read this text with optimal fluency. Reduced contrast of print limits text accessibility for many people in the developed world. Presenting text in a high-contrast format, such as black laser print on a white page, would increase the number of people able to access such information. Additionally, making text available in a format that can be presented on an LED computer monitor will increase access to written documents.
Chemical-text hybrid search engines.

Science.gov (United States)

Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

2010-01-01

As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.
Mining protein function from text using term-based support vector machines

Science.gov (United States)

Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

2005-01-01

Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835
An Integrated Multimedia Approach to Cultural Heritage e-Documents

NARCIS (Netherlands)

Smeulders, A.W.M.; Hardman, H.L.; Schreiber, G.; Geusebroek, J.M.

2002-01-01

We discuss access to e-documents from three different perspectives beyond the plain keyword web-search of the entire document. The first one is the situation-depending delivery of multimedia documents adapting the preferred form (picture, text, speech) to the available information capacity or need
Constructing Model of Relationship among Behaviors and Injuries to Products Based on Large Scale Text Data on Injuries

Science.gov (United States)

Nomori, Koji; Kitamura, Koji; Motomura, Yoichi; Nishida, Yoshifumi; Yamanaka, Tatsuhiro; Komatsubara, Akinori

In Japan, childhood injury prevention is urgent issue. Safety measures through creating knowledge of injury data are essential for preventing childhood injuries. Especially the injury prevention approach by product modification is very important. The risk assessment is one of the most fundamental methods to design safety products. The conventional risk assessment has been carried out subjectively because product makers have poor data on injuries. This paper deals with evidence-based risk assessment, in which artificial intelligence technologies are strongly needed. This paper describes a new method of foreseeing usage of products, which is the first step of the evidence-based risk assessment, and presents a retrieval system of injury data. The system enables a product designer to foresee how children use a product and which types of injuries occur due to the product in daily environment. The developed system consists of large scale injury data, text mining technology and probabilistic modeling technology. Large scale text data on childhood injuries was collected from medical institutions by an injury surveillance system. Types of behaviors to a product were derived from the injury text data using text mining technology. The relationship among products, types of behaviors, types of injuries and characteristics of children was modeled by Bayesian Network. The fundamental functions of the developed system and examples of new findings obtained by the system are reported in this paper.
The use of CD-ROMs for storage and document delivery at the British Library Document Supply Centre

International Nuclear Information System (INIS)

Bradbury, D.

1990-05-01

The British Library Document Supply Centre (BLDSC) has been in the forefront of international document delivery for 20 years. During the last 5 years it has been very actively involved in the ADONIS Project, through which the full text of some 200 journals in the life sciences have been stored, accessed, and delivered through the medium of CD-ROM. The BLDSC's involvement in this project is described and indications of the lessons learned and of the implications for international document delivery systems in the future are given. (author)
The use of CD-ROMs for storage and document delivery at the British Library Document Supply Centre

Energy Technology Data Exchange (ETDEWEB)

Bradbury, D [British Library Document Supply Centre, Boston SPA (United Kingdom)

1990-05-01

The British Library Document Supply Centre (BLDSC) has been in the forefront of international document delivery for 20 years. During the last 5 years it has been very actively involved in the ADONIS Project, through which the full text of some 200 journals in the life sciences have been stored, accessed, and delivered through the medium of CD-ROM. The BLDSC`s involvement in this project is described and indications of the lessons learned and of the implications for international document delivery systems in the future are given. (author).
Audit of Orthopaedic Surgical Documentation

Directory of Open Access Journals (Sweden)

Fionn Coughlan

2015-01-01

Full Text Available Introduction. The Royal College of Surgeons in England published guidelines in 2008 outlining the information that should be documented at each surgery. St. James’s Hospital uses a standard operation sheet for all surgical procedures and these were examined to assess documentation standards. Objectives. To retrospectively audit the hand written orthopaedic operative notes according to established guidelines. Methods. A total of 63 operation notes over seven months were audited in terms of date and time of surgery, surgeon, procedure, elective or emergency indication, operative diagnosis, incision details, signature, closure details, tourniquet time, postop instructions, complications, prosthesis, and serial numbers. Results. A consultant performed 71.4% of procedures; however, 85.7% of the operative notes were written by the registrar. The date and time of surgery, name of surgeon, procedure name, and signature were documented in all cases. The operative diagnosis and postoperative instructions were frequently not documented in the designated location. Incision details were included in 81.7% and prosthesis details in only 30% while the tourniquet time was not documented in any. Conclusion. Completion and documentation of operative procedures were excellent in some areas; improvement is needed in documenting tourniquet time, prosthesis and incision details, and the location of operative diagnosis and postoperative instructions.
Transitioning Existing Content: inferring organisation-specific documents

Directory of Open Access Journals (Sweden)

Arijit Sengupta

2000-11-01

Full Text Available A definition for a document type within an organization represents an organizational norm about the way the organizational actors represent products and supporting evidence of organizational processes. Generating a good organization-specific document structure is, therefore, important since it can capture a shared understanding among the organizational actors about how certain business processes should be performed. Current tools that generate document type definitions focus on the underlying technology, emphasizing tags created in a single instance document. The tools, thus, fall short of capturing the shared understanding between organizational actors about how a given document type should be represented. We propose a method for inferring organization-specific document structures using multiple instance documents as inputs. The method consists of heuristics that combine individual document definitions, which may have been compiled using standard algorithms. We propose a number of heuristics utilizing artificial intelligence and natural language processing techniques. As the research progresses, the heuristics will be tested on a suite of test cases representing multiple instance documents for different document types. The complete methodology will be implemented as a research prototype
Text mining by Tsallis entropy

Science.gov (United States)

Jamaati, Maryam; Mehri, Ali

2018-01-01

Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

Building Background Knowledge through Reading: Rethinking Text Sets

Science.gov (United States)

Lupo, Sarah M.; Strong, John Z.; Lewis, William; Walpole, Sharon; McKenna, Michael C.

2018-01-01

To increase reading volume and help students access challenging texts, the authors propose a four-dimensional framework for text sets. The quad text set framework is designed around a target text: a challenging content area text, such as a canonical literary work, research article, or historical primary source document. The three remaining…
ASM Based Synthesis of Handwritten Arabic Text Pages

Directory of Open Access Journals (Sweden)

Laslo Dinges

2015-01-01

Full Text Available Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
Modeling Documents with Event Model

Directory of Open Access Journals (Sweden)

Longhui Wang

2015-08-01

Full Text Available Currently deep learning has made great breakthroughs in visual and speech processing, mainly because it draws lessons from the hierarchical mode that brain deals with images and speech. In the field of NLP, a topic model is one of the important ways for modeling documents. Topic models are built on a generative model that clearly does not match the way humans write. In this paper, we propose Event Model, which is unsupervised and based on the language processing mechanism of neurolinguistics, to model documents. In Event Model, documents are descriptions of concrete or abstract events seen, heard, or sensed by people and words are objects in the events. Event Model has two stages: word learning and dimensionality reduction. Word learning is to learn semantics of words based on deep learning. Dimensionality reduction is the process that representing a document as a low dimensional vector by a linear mode that is completely different from topic models. Event Model achieves state-of-the-art results on document retrieval tasks.
English Metafunction Analysis in Chemistry Text: Characterization of Scientific Text

Directory of Open Access Journals (Sweden)

Ahmad Amin Dalimunte, M.Hum

2013-09-01

Full Text Available The objectives of this research are to identify what Metafunctions are applied in chemistry text and how they characterize a scientific text. It was conducted by applying content analysis. The data for this research was a twelve-paragraph chemistry text. The data were collected by applying a documentary technique. The document was read and analyzed to find out the Metafunction. The data were analyzed by some procedures: identifying the types of process, counting up the number of the processes, categorizing and counting up the cohesion devices, classifying the types of modulation and determining modality value, finally counting up the number of sentences and clauses, then scoring the grammatical intricacy index. The findings of the research show that Material process (71of 100 is mostly used, circumstance of spatial location (26 of 56 is more dominant than the others. Modality (5 is less used in order to avoid from subjectivity. Impersonality is implied through less use of reference either pronouns (7 or demonstrative (7, conjunctions (60 are applied to develop ideas, and the total number of the clauses are found much more dominant (109 than the total number of the sentences (40 which results high grammatical intricacy index. The Metafunction found indicate that the chemistry text has fulfilled the characteristics of scientific or academic text which truly reflects it as a natural science.
A Text Steganographic System Based on Word Length Entropy Rate

Directory of Open Access Journals (Sweden)

Francis Xavier Kofi Akotoye

2017-10-01

Full Text Available The widespread adoption of electronic distribution of material is accompanied by illicit copying and distribution. This is why individuals, businesses and governments have come to think of how to protect their work, prevent such illicit activities and trace the distribution of a document. It is in this context that a lot of attention is being focused on steganography. Implementing steganography in text document is not an easy undertaking considering the fact that text document has very few places in which to embed hidden data. Any minute change introduced to text objects can easily be noticed thus attracting attention from possible hackers. This study investigates the possibility of embedding data in text document by employing the entropy rate of the constituent characters of words not less than four characters long. The scheme was used to embed bits in text according to the alphabetic structure of the words, the respective characters were compared with their neighbouring characters and if the first character was alphabetically lower than the succeeding character according to their ASCII codes, a zero bit was embedded otherwise 1 was embedded after the characters had been transposed. Before embedding, the secret message was encrypted with a secret key to add a layer of security to the secret message to be embedded, and then a pseudorandom number was generated from the word counts of the text which was used to paint the starting point of the embedding process. The embedding capacity of the scheme was relatively high compared with the space encoding and semantic method.
Investigating the statistical properties of user-generated documents

OpenAIRE

Inches, Giacomo; Carman, Mark J.; Crestani, Fabio

2011-01-01

The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street...
Investigating the Statistical Properties of User-Generated Documents

OpenAIRE

Inches Giacomo; Carman Mark James

2011-01-01

The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user generated documents for some of the established services over the Internet (Kongregate Twitter Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal...
Signature detection and matching for document image retrieval.

Science.gov (United States)

Zhu, Guangyu; Zheng, Yefeng; Doermann, David; Jaeger, Stefan

2009-11-01

As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.
La Documentation photographique

Directory of Open Access Journals (Sweden)

Magali Hamm

2009-03-01

Full Text Available La Documentation photographique, revue destinée aux enseignants et étudiants en histoire-géographie, place l’image au cœur de sa ligne éditoriale. Afin de suivre les évolutions actuelles de la géographie, la collection propose une iconographie de plus en plus diversifiée : cartes, photographies, mais aussi caricatures, une de journal ou publicité, toutes étant considérées comme un document géographique à part entière. Car l’image peut se faire synthèse ; elle peut au contraire montrer les différentes facettes d’un objet ; souvent elle permet d’incarner des phénomènes géographiques. Associées à d’autres documents, les images aident les enseignants à initier leurs élèves à des raisonnements géographiques complexes. Mais pour apprendre à les lire, il est fondamental de les contextualiser, de les commenter et d’interroger leur rapport au réel.The Documentation photographique, magazine dedicated to teachers and students in History - Geography, places the image at the heart of its editorial line. In order to follow the evolutions of Geography, the collection presents a more and more diversified iconography: maps, photographs, but also drawings or advertisements, all this documents being considered as geographical ones. Because image can be a synthesis; on the contrary it can present the different facets of a same object; often it enables to portray geographical phenomena. Related to other documents, images assist the teachers in the students’ initiation to complex geographical reasoning. But in order to learn how to read them, it is fundamental to contextualize them, comment them and question their relations with reality.
Use of an advanced document system in post-refuelling updating of nuclear power plant documentation

International Nuclear Information System (INIS)

Puech Suanzes, P.; Cortes Soler, M.

1993-01-01

This paper discusses the results of the extensive use of an advanced document system to update documentation prepared by traditional methods and affected by changes in the period between two plant refuellings. The implementation of a system for the capture, retrieval and storage of drawings using optical discs is part of a plan to modernize production and management tools and to thus achieve better control of document configuration. These processes are consequently optimized in that: 1. The deterioration of drawings is detained with the help of an identical, updated, legible, reliable support for all users. 2. The time required to update documentation is reduced. Given the large number of drawings, the implementation method should effectively combine costs and time. The document management tools ensure optical disc storage control so that from the moment a drawing resides in the system, any modification to it is made through the system utilities, thus ensuring quality and reducing schedules. The system described was used to update the electrical drawings of Almaraz Nuclear Power Plant. Changes made during the eighth refuelling of Unit I were incorporated and the time needed to issue the updated drawings was reduced by one month. (author)
Text Summarization Using FrameNet-Based Semantic Graph Model

Directory of Open Access Journals (Sweden)

Xu Han

2016-01-01

Full Text Available Text summarization is to generate a condensed version of the original document. The major issues for text summarization are eliminating redundant information, identifying important difference among documents, and recovering the informative content. This paper proposes a Semantic Graph Model which exploits the semantic information of sentence using FSGM. FSGM treats sentences as vertexes while the semantic relationship as the edges. It uses FrameNet and word embedding to calculate the similarity of sentences. This method assigns weight to both sentence nodes and edges. After all, it proposes an improved method to rank these sentences, considering both internal and external information. The experimental results show that the applicability of the model to summarize text is feasible and effective.
Documentation of Cultural Heritage Objects

Directory of Open Access Journals (Sweden)

Jon Grobovšek

2013-09-01

Full Text Available EXTENDED ABSTRACT:The first and important phase of documentation of cultural heritage objects is to understand which objects need to be documented. The entire documentation process is determined by the characteristics and scope of the cultural heritage object. The next question to be considered is the expected outcome of the documentation process and the purpose for which it will be used. These two essential guidelines determine each stage of the documentation workflow: the choice of the most appropriate data capturing technology and data processing method, how detailed should the documentation be, what problems may occur, what the expected outcome is, what it will be used for, and the plan for storing data and results. Cultural heritage objects require diverse data capturing and data processing methods. It is important that even the first stages of raw data capturing are oriented towards the applicability of results. The selection of the appropriate working method can facilitate the data processing and the preparation of final documentation. Documentation of paintings requires different data capturing method than documentation of buildings or building areas. The purpose of documentation can also be the preservation of the contemporary cultural heritage to posterity or the basis for future projects and activities on threatened objects. Documentation procedures should be adapted to our needs and capabilities. Captured and unprocessed data are lost unless accompanied by additional analyses and interpretations. Information on tools, procedures and outcomes must be included into documentation. A thorough analysis of unprocessed but accessible documentation, if adequately stored and accompanied by additional information, enables us to gather useful data. In this way it is possible to upgrade the existing documentation and to avoid data duplication or unintentional misleading of users. The documentation should be archived safely and in a way to meet
Text mining from ontology learning to automated text processing applications

CERN Document Server

Biemann, Chris

2014-01-01

This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects
Text localization using standard deviation analysis of structure elements and support vector machines

Directory of Open Access Journals (Sweden)

Zagoris Konstantinos

2011-01-01

Full Text Available Abstract A text localization technique is required to successfully exploit document images such as technical articles and letters. The proposed method detects and extracts text areas from document images. Initially a connected components analysis technique detects blocks of foreground objects. Then, a descriptor that consists of a set of suitable document structure elements is extracted from the blocks. This is achieved by incorporating an algorithm called Standard Deviation Analysis of Structure Elements (SDASE which maximizes the separability between the blocks. Another feature of the SDASE is that its length adapts according to the requirements of the application. Finally, the descriptor of each block is used as input to a trained support vector machines that classify the block as text or not. The proposed technique is also capable of adjusting to the text structure of the documents. Experimental results on benchmarking databases demonstrate the effectiveness of the proposed method.
Requirements for the data transfer during the examination of design documentation

Directory of Open Access Journals (Sweden)

Karakozova Irina

2017-01-01

Full Text Available When you transfer the design documents to the examination office, number of incompatible electronic documents increases dramatically. The article discusses the way to solve the problem of transferring of the text and graphic data of design documentation for state and non-state expertise, as well as verification of estimates and requirement management. The methods for the recognition of the system elements and requirements for the transferring of text and graphic design documents are provided. The need to use the classification and coding of various elements of information systems (structures, objects, resources, requirements, contracts, etc. in data transferring systems is indicated separately. The authors have developed a sequence of document processing and transmission of data during the examination, and propose a language for describing the construction of the facility, taking into account the classification criteria of the structures and construction works.
Compression of Probabilistic XML documents

NARCIS (Netherlands)

Veldman, Irma

2009-01-01

Probabilistic XML (PXML) files resulting from data integration can become extremely large, which is undesired. For XML there are several techniques available to compress the document and since probabilistic XML is in fact (a special form of) XML, it might benefit from these methods even more. In
Text Classification and Distributional features techniques in Datamining and Warehousing

OpenAIRE

Bethu, Srikanth; Babu, G Charless; Vinoda, J; Priyadarshini, E; rao, M Raghavendra

2013-01-01

Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of unimportant words may increase and document may be classified in the wrong category.For reducing the error of classifying of documents in wrong category. The Distributional features are introduced. In the Distribuional Features, the Distribution of the words in ...
New Challenges of the Documentation in Media

Directory of Open Access Journals (Sweden)

Antonio García Jiménez

2015-07-01

Full Text Available This special issue, presented by index.comunicación, is focused on media related information & documentation. This field undergoes constant and profound changes, especially visible in documentation processes. A situation characterized by the existence of tablets, smartphones, applications, and by the almost achieved digitization of traditional documents, in addition to the crisis of the press business model, that involves mutations in the journalists’ tasks and in the relationship between them and Documentation. Papers included in this special issue focus on some of the concerns in this domain: the progressive autonomy of the journalist in access to information sources, the role of press offices as documentation sources, the search of information on the web, the situation of media blogs, the viability of elements of information architecture in smart TV and the development of social TV and its connection to Documentation.
LOCAL BINARIZATION FOR DOCUMENT IMAGES CAPTURED BY CAMERAS WITH DECISION TREE

Directory of Open Access Journals (Sweden)

Naser Jawas

2012-07-01

Full Text Available Character recognition in a document image captured by a digital camera requires a good binary image as the input for the separation the text from the background. Global binarization method does not provide such good separation because of the problem of uneven levels of lighting in images captured by cameras. Local binarization method overcomes the problem but requires a method to partition the large image into local windows properly. In this paper, we propose a local binariation method with dynamic image partitioning using integral image and decision tree for the binarization decision. The integral image is used to estimate the number of line in the document image. The number of line in the document image is used to devide the document into local windows. The decision tree makes a decision for threshold in every local window. The result shows that the proposed method can separate the text from the background better than using global thresholding with the best OCR result of the binarized image is 99.4%. Pengenalan karakter pada sebuah dokumen citra yang diambil menggunakan kamera digital membutuhkan citra yang terbinerisasi dengan baik untuk memisahkan antara teks dengan background. Metode binarisasi global tidak memberikan hasil pemisahan yang bagus karena permasalahan tingkat pencahayaan yang tidak seimbang pada citra hasil kamera digital. Metode binarisasi lokal dapat mengatasi permasalahan tersebut namun metode tersebut membutuhkan metode untuk membagi citra ke dalam bagian-bagian window lokal. Pada paper ini diusulkan sebuah metode binarisasi lokal dengan pembagian citra secara dinamis menggunakan integral image dan decision tree untuk keputusan binarisasi lokalnya. Integral image digunakan untuk mengestimasi jumlah baris teks dalam dokumen citra. Jumlah baris tersebut kemudian digunakan untuk membagi citra dokumen ke dalam window lokal. Keputusan nilai threshold untuk setiap window lokal ditentukan dengan decisiontree. Hasilnya menunjukkan
Designing Documents for People to Use

Directory of Open Access Journals (Sweden)

David Sless

Full Text Available This article reports on the work of Communication Research Institute (CRI, an international research center specializing in communication and information design. With the support of government, regulators, industry bodies, and business—and with the participation of people and their advocates—CRI has worked on over 200 public document design projects since it began as a small unit in 1985. CRI investigates practical methods and achievable standards for designing digital and paper public documents, including forms; workplace procedural notices; bills, letters, and emails sent by organizations; labels and instructions that accompany products and services; and legal and financial documents and contracts. CRI has written model grammars for the document types it designs, and the cumulative data from CRI projects has led to a set of systematic methods for designing public-use documents to a high standard. Through research, design, publishing, and advocacy, CRI works to measurably improve the ordinary documents we all have to use. Keywords: Information design, Design methods, Design standards, Communication design, Design diagnostic testing, Design research

Browsing, Discovery and Search in Large Distributed Databases of Complex and Scanned Documents

National Research Council Canada - National Science Library

Croft, W

1998-01-01

.... These techniques should be particularly relevant to the patent domain where it is important to find relationships between documents and where the patent or trademark may be based on a visual design...
The Role of Text Mining in Export Control

Energy Technology Data Exchange (ETDEWEB)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon [Korea Institute of Nuclear Nonproliferation and Control, Daejeon (Korea, Republic of)

2015-10-15

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control.
The Role of Text Mining in Export Control

International Nuclear Information System (INIS)

Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon

2015-01-01

Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control
Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

Science.gov (United States)

Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

2009-12-22

Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
Integrated management systems and workflow-based electronic document management: An empirical study

Directory of Open Access Journals (Sweden)

Hang Thu Pho

2014-01-01

Full Text Available Purpose: Many global organizations have aligned their strategy and operation via the ISO-based framework of integrated management system (IMS that allows them to merge quality, environment, health and safety management systems. In such context, having a robust electronic document management system (EDMS is essential, especially at global enterprises where a large amount of documents generated by processes flows through different work cultures. However, there is no "one-size-fits-all" design for EDMS because it depends on organizations' needs, size and resource allocation. This article discusses the interrelation between EDMS and IMS in order to suggest a best practice. Design/methodology/approach: This article methodologically based upon a qualitative, interpretivistic, longitudinal empirical study in a wind turbine factory. Findings and Originality/value: IMS improvement and effectiveness has been overlooking EDMS as a key factor in establishing appropriate technological support of the IMS processes. Rightful application of EDMS can further contribute to organizational learning, precision of documentation and cross-organisational collaboration. Research limitations/implications: Theorising on IMS needs a stronger perspective of the technological limitations and potentials of basing IMS on EDMS. Practical implications: IMS are complex systems involving a large number of administrative functions. EDMS provides a formal representation with automation potentials both heightening and securing document trustworthiness. Social implications: IMS has a tendency to stay with professionals, e.g. line managers and QA/QC/QMS professionals. The EDMS line of discussion suggests a broader inclusion. Originality/value: Researching IMS as a technological implementation is giving a better platform of aligning the IMS with other business processes and is bringing IMS closer to the operational activities within the enterprise.
Using the Characteristics of Documents, Users and Tasks to Predict the Situational Relevance of Health Web Documents

Directory of Open Access Journals (Sweden)

Melinda Oroszlányová

2017-09-01

Full Text Available Relevance is usually estimated by search engines using document content, disregarding the user behind the search and the characteristics of the task. In this work, we look at relevance as framed in a situational context, calling it situational relevance, and analyze whether it is possible to predict it using documents, users and tasks characteristics. Using an existing dataset composed of health web documents, relevance judgments for information needs, user and task characteristics, we build a multivariate prediction model for situational relevance. Our model has an accuracy of 77.17%. Our findings provide insights into features that could improve the estimation of relevance by search engines, helping to conciliate the systemic and situational views of relevance. In a near future we will work on the automatic assessment of document, user and task characteristics.
STANDARDIZATION OF MEDICAL DOCUMENT FLOW: PRINCIPLES AND FEATURES

Directory of Open Access Journals (Sweden)

Melentev Vladimir Anatolevich

2013-04-01

Full Text Available In presented article the questions connected with the general concepts and bases of functioning of document flow in borders of any economic object (the enterprise, establishment, the organization are considered. Gostirovanny definition of document flow, classification of types of documentary streams is given. The basic principles of creation of document flow, following which are considered allows to create optimum structure документопотока and nature of movement of documents; interrelation of external and internal influences. Further basic elements of medical document flow are considered; the main problems of medical document flow being, besides, major factors, distinguishing medical document flow from document flow of manufacturing enterprises or other economic objects are specified. From consideration of these problems the conclusion about an initial stage of their decision - standardization of the medical document flow, being, besides, is drawn by the first stage of creation of a common information space of medical branch.
Extracting and connecting chemical structures from text sources using chemicalize.org.

Science.gov (United States)

Southan, Christopher; Stracz, Andras

2013-04-23

Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and
SANSMIC design document.

Energy Technology Data Exchange (ETDEWEB)

Weber, Paula D. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States); Rudeen, David Keith [GRAM, Inc., Albuquerque, NM (United States)

2015-07-01

The United States Strategic Petroleum Reserve (SPR) maintains an underground storage system consisting of caverns that were leached or solution mined in four salt domes located near the Gulf of Mexico in Texas and Louisiana. The SPR comprises more than 60 active caverns containing approximately 700 million barrels of crude oil. Sandia National Labo- ratories (SNL) is the geotechnical advisor to the SPR. As the most pressing need at the inception of the SPR was to create and fill storage volume with oil, the decision was made to leach the caverns and fill them simultaneously (leach-fill). Therefore, A.J. Russo developed SANSMIC in the early 1980s which allows for a transient oil-brine interface (OBI) making it possible to model leach-fill and withdrawal operations. As the majority of caverns are currently filled to storage capacity, the primary uses of SANSMIC at this time are related to the effects of small and large withdrawals, expansion of existing caverns, and projecting future pillar to diameter ratios. SANSMIC was identified by SNL as a priority candidate for qualification. This report continues the quality assurance (QA) process by documenting the "as built" mathematical and numerical models that comprise this document. The pro- gram flow is outlined and the models are discussed in detail. Code features that were added later or were not documented previously have been expounded. No changes in the code's physics have occurred since the original documentation (Russo, 1981, 1983) although recent experiments may yield improvements to the temperature and plume methods in the future.
Strategy as Texts

DEFF Research Database (Denmark)

Obed Madsen, Søren

of the strategy into four categories. Second, the managers produce new texts based on the original strategy document by using four different ways of translation models. The study’s findings contribute to three areas. Firstly, it shows that translation is more than a sociological process. It is also...... a craftsmanship that requires knowledge and skills, which unfortunately seems to be overlooked in both the literature and in practice. Secondly, it shows that even though a strategy text is in singular, the translation makes strategy plural. Thirdly, the article proposes a way to open up the black box of what......This article shows empirically how managers translate a strategy plan at an individual level. By analysing how managers in three organizations translate strategies, it identifies that the translation happens in two steps: First, the managers decipher the strategy by coding the different parts...
Patent documentation - comparison of two MT strategies

DEFF Research Database (Denmark)

Offersgaard, Lene; Povlsen, Claus

2007-01-01

This paper focuses on two matters: A comparison of how two different MT strategies manage translating the text type of patent documentation and a survey of what is needed to transform a MT research prototype system to a translation application for patent texts. The two MT strategies is represented....... The distinctive text type of patents pose special demands for machine translation and these aspects are discussed based on linguistic observations with focus on the users point of view. Two main demands are automatic pre processing of the documents and implementation of a module which in a flexible and user......-friendly manner offers the opportunity to extend the lexical coverage of the system. These demands and the comparison of the two MT strategies are discussed on the basis of proofread patents....
Digital watermarks in electronic document circulation

Directory of Open Access Journals (Sweden)

Vitaliy Grigorievich Ivanenko

2017-07-01

Full Text Available This paper reviews different protection methods for electronic documents, their good and bad qualities. Common attacks on electronic documents are analyzed. Digital signature and ways of eliminating its flaws are studied. Different digital watermark embedding methods are described, they are divided into 2 types. The solution to protection of electronic documents is based on embedding digital watermarks. Comparative analysis of this methods is given. As a result, the most convenient method is suggested – reversible data hiding. It’s remarked that this technique excels at securing the integrity of the container and its digital watermark. Digital watermark embedding system should prevent illegal access to the digital watermark and its container. Digital watermark requirements for electronic document protection are produced. Legal aspect of copyright protection is reviewed. Advantages of embedding digital watermarks in electronic documents are produced. Modern reversible data hiding techniques are studied. Distinctive features of digital watermark use in Russia are highlighted. Digital watermark serves as an additional layer of defense, that is in most cases unknown to the violator. With an embedded digital watermark, it’s impossible to misappropriate the authorship of the document, even if the intruder signs his name on it. Therefore, digital watermarks can act as an effective additional tool to protect electronic documents.
3D DOCUMENTATION OF 40 KILOMETERS OF HISTORICAL PORTICOES – THE CHALLENGE

Directory of Open Access Journals (Sweden)

F. Remondino

2016-06-01

Full Text Available In the last years the image-based pipeline for 3D reconstruction purposes has received large interest leading to fully automated methodologies able to process large image datasets and deliver 3D products with a level of detail and precision variable according to the applications. Different open issues still exist, in particular when dealing with the 3D surveying and modeling of large and complex scenarios, like historical porticoes. The paper presents an evaluation of various surveying methods for the geometric documentation of ca 40km of historical porticoes in Bologna (Italy. Finally, terrestrial photogrammetry was chosen as the most flexible and productive technique in order to deliver 3D results in form of colored point clouds or textured 3D meshes accessible on the web. The presented digital products are a complementary material for the final candidature of the porticoes as UNESCO WHS.
Text-interpreter language for flexible generation of patient notes and instructions.

Science.gov (United States)

Forker, T S

1992-01-01

An interpreted computer language has been developed along with a windowed user interface and multi-printer-support formatter to allow preparation of documentation of patient visits, including progress notes, prescriptions, excuses for work/school, outpatient laboratory requisitions, and patient instructions. Input is by trackball or mouse with little or no keyboard skill required. For clinical problems with specific protocols, the clinician can be prompted with problem-specific items of history, exam, and lab data to be gathered and documented. The language implements a number of text-related commands as well as branching logic and arithmetic commands. In addition to generating text, it is simple to implement arithmetic calculations such as weight-specific drug dosages; multiple branching decision-support protocols for paramedical personnel (or physicians); and calculation of clinical scores (e.g., coma or trauma scores) while simultaneously documenting the status of each component of the score. ASCII text files produced by the interpreter are available for computerized quality audit. Interpreter instructions are contained in text files users can customize with any text editor.
Eigenvector space model to capture features of documents

Directory of Open Access Journals (Sweden)

Choi DONGJIN

2011-09-01

Full Text Available Eigenvectors are a special set of vectors associated with a linear system of equations. Because of the special property of eigenvector, it has been used a lot for computer vision area. When the eigenvector is applied to information retrieval field, it is possible to obtain properties of documents data corpus. To capture properties of given documents, this paper conducted simple experiments to prove the eigenvector is also possible to use in document analysis. For the experiment, we use short abstract document of Wikipedia provided by DBpedia as a document corpus. To build an original square matrix, the most popular method named tf-idf measurement will be used. After calculating the eigenvectors of original matrix, each vector will be plotted into 3D graph to find what the eigenvector means in document processing.
Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents

Directory of Open Access Journals (Sweden)

Anabel Usie

2014-02-01

Full Text Available One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this ‘up-to-dateness’ came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP. In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains ‘up-to-dateness’ of the results. Availability: http://metres.udl.cat/index.php/downloads, Contact: metres.cmb@gmail.com.
Basic freight forwarding and transport  documentation in freight forwarder’s work

Directory of Open Access Journals (Sweden)

Adam Salomon

2014-09-01

Full Text Available The purpose of the article is to present the basic documentation in international freight forwarder’s work, in particular, insurance documents and transport documents in various modes of transport. An additional goal is to identify sources of the paper, which can be used to properly completing the individual documents.
Using Text Documents from American Memory.

Science.gov (United States)

Singleton, Laurel R., Ed.

2002-01-01

This publication contains classroom-tested teaching ideas. For grades K-4, "'Blessed Ted-fred': Famous Fathers Write to Their Children" uses American Memory for primary source letters written by Theodore Roosevelt and Alexander Graham Bell to their children. For grades 5-8, "Found Poetry and the American Life Histories…
ASM Based Synthesis of Handwritten Arabic Text Pages.

Science.gov (United States)

Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed

2015-01-01

Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
NAMED ENTITY RECOGNITION FROM BIOMEDICAL TEXT -AN INFORMATION EXTRACTION TASK

Directory of Open Access Journals (Sweden)

N. Kanya

2016-07-01

Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.

The socio-demographics of texting

DEFF Research Database (Denmark)

Ling, Richard; Bertel, Troels Fibæk; Sundsøy, Pål

2012-01-01

Who texts, and with whom do they text? This article examines the use of texting using metered traffic data from a large dataset (nearly 400 million anonymous text messages). We ask 1) How much do different age groups use mobile phone based texting (SMS)? 2) How wide is the circle of texting...
Principles of reusability of XML-based enterprise documents

Directory of Open Access Journals (Sweden)

Roman Malo

2010-01-01

Full Text Available XML (Extensible Markup Language represents one of flexible platforms for processing enterprise documents. Its simple syntax and powerful software infrastructure for processing this type of documents is a guarantee for high interoperability of individual documents. XML is today one of technologies influencing all aspects of ICT area.In the paper questions and basic principles of reusing XML-based documents are described in the field of enterprise documents. If we use XML databases or XML data types for storing these types of documents then partial redundancy could be expected due to possible documents’ similarity. This similarity can be found especially in documents’ structure and also in documents’ content and its elimination is necessary part of data optimization.The main idea of the paper is focused to possibilities how to think about dividing complex XML documents into independent fragments that can be used as standalone documents and how to process them.Conclusions could be applied within software tools working with XML-based structured data and documents as document management systems or content management systems.
A New Wavelet-Based Document Image Segmentation Scheme

Institute of Scientific and Technical Information of China (English)

赵健; 李道京; 俞卞章; 耿军平

2002-01-01

The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types: background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method; secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution' s HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by -X2 and L. Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.
PathText: a text mining integrator for biological pathway visualizations

Science.gov (United States)

Kemper, Brian; Matsuzaki, Takuya; Matsuoka, Yukiko; Tsuruoka, Yoshimasa; Kitano, Hiroaki; Ananiadou, Sophia; Tsujii, Jun'ichi

2010-01-01

Motivation: Metabolic and signaling pathways are an increasingly important part of organizing knowledge in systems biology. They serve to integrate collective interpretations of facts scattered throughout literature. Biologists construct a pathway by reading a large number of articles and interpreting them as a consistent network, but most of the models constructed currently lack direct links to those articles. Biologists who want to check the original articles have to spend substantial amounts of time to collect relevant articles and identify the sections relevant to the pathway. Furthermore, with the scientific literature expanding by several thousand papers per week, keeping a model relevant requires a continuous curation effort. In this article, we present a system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. The system, PathText, is developed by Systems Biology Institute, Okinawa Institute of Science and Technology, National Centre for Text Mining (University of Manchester) and the University of Tokyo, and is being used by groups of biologists from these locations. Contact: brian@monrovian.com. PMID:20529930
Use of an advanced document system in post-refuelling updating of nuclear power plant documentation; Utilizacion de un sistema documental avanzado en la actualizacion de documentacion post recarga

Energy Technology Data Exchange (ETDEWEB)

Puech Suanzes, P; Cortes Soler, M [Empresarios Agrupados, A.I.E., Madrid (Spain)

1993-12-15

This paper discusses the results of the extensive use of an advanced document system to update documentation prepared by traditional methods and affected by changes in the period between two plant refuellings. The implementation of a system for the capture, retrieval and storage of drawings using optical discs is part of a plan to modernize production and management tools and to thus achieve better control of document configuration. These processes are consequently optimized in that: 1. The deterioration of drawings is detained with the help of an identical, updated, legible, reliable support for all users. 2. The time required to update documentation is reduced. Given the large number of drawings, the implementation method should effectively combine costs and time. The document management tools ensure optical disc storage control so that from the moment a drawing resides in the system, any modification to it is made through the system utilities, thus ensuring quality and reducing schedules. The system described was used to update the electrical drawings of Almaraz Nuclear Power Plant. Changes made during the eighth refuelling of Unit I were incorporated and the time needed to issue the updated drawings was reduced by one month. (author)
Experimental determination of chosen document elements parameters from raster graphics sources

Directory of Open Access Journals (Sweden)

Jiří Rybička

2010-01-01

Full Text Available Visual appearance of documents and their formal quality is considered to be as important as the content quality. Formal and typographical quality of documents can be evaluated by an automated system that processes raster images of documents. A document is described by a formal model that treats a page as an object and also as a set of elements, whereas page elements include text and graphic object. All elements are described by their parameters depending on elements’ type. For future evaluation, mainly text objects are important. This paper describes the experimental determination of chosen document elements parameters from raster images. Techniques for image processing are used, where an image is represented as a matrix of dots and parameter values are extracted. Algorithms for parameter extraction from raster images were designed and were aimed mainly at typographical parameters like indentation, alignment, font size or spacing. Algorithms were tested on a set of 100 images of paragraphs or pages and provide very good results. Extracted parameters can be directly used for typographical quality evaluation.
A Survey: Framework of an Information Retrieval for Malay Translated Hadith Document

Directory of Open Access Journals (Sweden)

Zulkefli Nurul Syeilla Syazhween

2017-01-01

Full Text Available This paper reviews and analyses the limitation of the existing method used in the IR process in retrieving Malay Translated Hadith documents related to the search request. Traditional Malay Translated Hadith retrieval system has not focused on semantic extraction from text. The bag-of-words representation ignores the conceptual similarity of information in the query text and documents, which produce unsatisfactory retrieval results. Therefore, a more efficient IR framework is needed. This paper claims that the significant information extraction and subject-related information are actually important because the clues from this information can be used to search and find the relevance document to a query. Also, unimportant information can be discarded to represent the document content. So, semantic understanding of query and document is necessary to improve the effectiveness and accuracy of retrieval results for this domain study. Therefore, advance research is needed and it will be experimented in the future work. It is hoped that it will help users to search and find information regarding to the Malay Translated Hadith document.
Text Induced Spelling Correction

NARCIS (Netherlands)

Reynaert, M.W.C.

2004-01-01

We present TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from a very large corpus of raw text, without supervision, and contains word
2011 Addendum to the SNL/NM SWEIS Supplemental Information Source Documents

Energy Technology Data Exchange (ETDEWEB)

Dimmick, Ross [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2014-12-01

This document contains updates to the Supplemental Information Sandia National Laboratories/New Mexico Site-Wide Environmental Impact Statement Source Documents that were developed in 2010. In general, this addendum provides calendar year 2010 data, along with changes or additions to text in the original documents.
Areva - 2011 Reference document

International Nuclear Information System (INIS)

2011-01-01

After having indicated the person responsible of this document and the legal account auditors, and provided some financial information, this document gives an overview of the different risk factors existing in the company: law risks, industrial and environmental risks, operational risks, risks related to large projects, market and liquidity risks. Then, after having recalled the history and evolution of the company and the evolution of its investments over the last five years, it proposes an overview of Areva's activities on the markets of nuclear energy and renewable energies, of its clients and suppliers, of its strategy, of the activities of its different departments. Other information are provided: company's flow chart, estate properties (plants, equipment), an analysis of its financial situation, its research and development policy, the present context, profit previsions or estimations, management organization and operation
Sharing and Adaptation of Educational Documents in E-Learning

Directory of Open Access Journals (Sweden)

Chekry Abderrahman

2012-03-01

Full Text Available Few documents can be reused among the huge number of the educational documents on the web. The exponential increase of these documents makes it almost impossible to search for relevant documents. In addition to this, e-learning is designed for public users who have different levels of knowledge and varied skills so they should be given a content that sees to their needs. This work is about adapting the content of learning with learners preferences, and give the teachers the ability to reuse a given content.
Cuneiform Documents from Various Dutch Collections

NARCIS (Netherlands)

Boer, de R.; Dercksen, J.G.; Krispijn, Th.J.H.; J.G., Dercksen e.a.

2013-01-01

Publication of Sumerian and Akkadian cuneiform texts in private collections from various periods: * Presargonic: letter-unknown provenance in Northern Babylonia * Ur III: administrative document-Umma) * Old Assyrian: letters-Kaneš (Anatolia) * Old Babylonian: lexical series Ugumu-unknown
Terminology extraction from medical texts in Polish.

Science.gov (United States)

Marciniak, Małgorzata; Mykowiecka, Agnieszka

2014-01-01

Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were
The journey from texting to applications on personally owned devices to enhance student eEngagement in large lectures: A pilot study

Directory of Open Access Journals (Sweden)

Trevor Nesbit

Full Text Available Increasing class sizes to gain economies of scale have resulted in less interaction between lecturers and students during lectures. This paper presented the results of a pilot study that set out to examine the use of applications on personally owned devices (APODs to enhance student interaction, participation and engagement in large lectures. The pilot study commences with the development and trial of a text messaging based application, and after a survey of students regarding ownership levels of mobile devices, concludes with the trial of an application developed for mobile devices. The conclusions of the paper highlight that the use of APODs can significantly increase student interaction, participation and engagement in large lectures and identifies implications and opportunities for further research.
PHOTOGRAPHY AS DOCUMENT: OTLET AND BRIET’S CONSIDERATIONS

Directory of Open Access Journals (Sweden)

Izângela Maria Sansoni Tonello

2018-04-01

Full Text Available Introduction: The amount and variety of information that are conveyed in different media and means incite a concern, especially in relation to photographic documents, since they are currently the focus of interest of the Information Science field. In this context, this paper emphasizes the role of photographs as sources of information capable of generating knowledge as well as an important aid for research in different areas. Objective: The main goal of this study was to research the concepts and definitions underpinning the photograph as a document in information units. Methodology: Bibliographic and documentary research. Results: It can be affirmed through the meanings about the term document discussed in the literature by the researched authors that the photograph corresponds to the assumptions necessary to substantiate document and photograph in photographic document. Conclusions: It is understood that this study clarifies some issues related to photograph as a document; however, this proposition raises reflections about the importance of the production context as well as its essential relationship with other documents, so that it is indisputably consolidated as a photographic document.
Improving nurse documentation and record keeping in stoma care

OpenAIRE

Law, Lesley; Akroyd, Karen; Burke, Linda

2010-01-01

Evidence suggests that nurse documentation is often inconsistent and lacks a coherent and standardized approach. This article reports on research into the use of nurse documentation on a stoma care ward in a large London hospital, and explores the factors that may affect the process of record keeping by nursing staff. This study uses stoma care as a case study to explore the role of documentation on the ward, focusing on how this can be improved. It is based on quantitative and qualitative me...
Achieving IT-supported standardized nursing documentation through participatory design

DEFF Research Database (Denmark)

Rasmussen, Stine Loft; Lyng, Karen Marie; Jensen, Sanne

2012-01-01

that support guideline-based highly structured standard documentation in a large organization with many stakeholders. Applying a participatory design (PD) approach at many organizational levels has involved the stakeholders actively in the design process. Developing a set of design principles has concurrently......In the Capital Region of Denmark a full-scale pilot project on IT-supported nursing documentation is - after running for two months at one full university hospital - showing promising results. In this paper we discuss participatory design as a method to design clinical documentation templates...
Layout-aware text extraction from full-text PDF of scientific articles

Directory of Open Access Journals (Sweden)

Ramakrishnan Cartic

2012-05-01

Full Text Available Abstract Background The Portable Document Format (PDF is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1 Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2 Classifying text blocks into rhetorical categories using a rule-based method and (3 Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF
The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations; Texte des Accords de Cooperation Conclus entre L'Agence et des Organisations Intergouvernementales Regionales

Energy Technology Data Exchange (ETDEWEB)

NONE

1961-02-07

The texts of the Agency's agreements for co-operation with the regional inter-governmental organizations listed below, together with the respective protocols authenticating them, are reproduced in this document in the order in which the agreements entered into force, for the information of all Members of the Agency [French] Le present document reproduit le texte des accords de cooperation que l'Agence a conclus avec les organisations intergouvernementales regionales enumerees ci-apres, ainsi que celui des protocoles validant lesdits accords. Le texte de ces instruments, classes dans l'ordre chronologique de leur entree en vigueur, est communique, pour information, a tous les Membres de l'Agence.
Word Spotting for Indic Documents to Facilitate Retrieval

Science.gov (United States)

Bhardwaj, Anurag; Setlur, Srirangaraj; Govindaraju, Venu

With advances in the field of digitization of printed documents and several mass digitization projects underway, information retrieval and document search have emerged as key research areas. However, most of the current work in these areas is limited to English and a few oriental languages. The lack of efficient solutions for Indic scripts has hampered information extraction from a large body of documents of cultural and historical importance. This chapter presents two relevant topics in this area. First, we describe the use of a script-specific keyword spotting for Devanagari documents that makes use of domain knowledge of the script. Second, we address the needs of a digital library to provide access to a collection of documents from multiple scripts. This requires intelligent solutions which scale across different scripts. We present a script-independent keyword spotting approach for this purpose. Experimental results illustrate the efficacy of our methods.

Discrepancies in Communication Versus Documentation of Weight-Management Benchmarks

Directory of Open Access Journals (Sweden)

Christy B. Turer MD, MHS

2017-02-01

Full Text Available To examine gaps in communication versus documentation of weight-management clinical practices, communication was recorded during primary care visits with 6- to 12-year-old overweight/obese Latino children. Communication/documentation content was coded by 3 reviewers using communication transcripts and health-record documentation. Discrepancies in communication/documentation content codes were resolved through consensus. Bivariate/multivariable analyses examined factors associated with discrepancies in benchmark communication/documentation. Benchmarks were neither communicated nor documented in up to 42% of visits, and communicated but not documented or documented but not communicated in up to 20% of visits. Lowest benchmark performance rates were for laboratory studies (35% and nutrition/weight-management referrals (42%. In multivariable analysis, overweight (vs obesity was associated with 1.6 more discrepancies in communication versus documentation (P = .03. Many weight-management benchmarks are not met, not documented, or performed without being communicated. Enhanced communication with families and documentation in health records may promote lifestyle changes in overweight children and higher quality care for overweight children in primary care.
Layout-aware text extraction from full-text PDF of scientific articles.

Science.gov (United States)

Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc

2012-05-28

The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for
Aiding the Interpretation of Ancient Documents

DEFF Research Database (Denmark)

Roued-Cunliffe, Henriette

How can Decision Support System (DSS) software aid the interpretation process involved in the reading of ancient documents? This paper discusses the development of a DSS prototype for the reading of ancient texts. In this context the term ‘ancient documents’ is used to describe mainly Greek...... tool it is important first to comprehend the interpretation process involved in reading ancient documents. This is not a linear process but rather a recursive process where the scholar moves between different levels of reading, such as ‘understanding the meaning of a character’ or ‘understanding...
Stamp Detection in Color Document Images

DEFF Research Database (Denmark)

Micenkova, Barbora; van Beusekom, Joost

2011-01-01

, moreover, it can be imprinted with a variable quality and rotation. Previous methods were restricted to detection of stamps of particular shapes or colors. The method presented in the paper includes segmentation of the image by color clustering and subsequent classification of candidate solutions...... by geometrical and color-related features. The approach allows for differentiation of stamps from other color objects in the document such as logos or texts. For the purpose of evaluation, a data set of 400 document images has been collected, annotated and made public. With the proposed method, recall of 83...
Where are the Search Engines for Handwritten Documents?

NARCIS (Netherlands)

van der Zant, Tijn; Schomaker, Lambert; Zinger, Svitlana; van Schie, Henny

Although the problems of optical character recognition for contemporary printed text have been resolved, for historical printed and handwritten connected cursive text (i.e. western style writing), they have not. This does not mean that scanning historical documents is not useful. This article
Where are the search engines for handwritten documents?

NARCIS (Netherlands)

Zant, T.; Schomaker, L.; Zinger, S.; Schie, H.

2009-01-01

Although the problems of optical character recognition for contemporary printed text have been resolved, for historical printed and handwritten connected cursive text (i.e. western style writing), they have not. This does not mean that scanning historical documents is not useful. This article
Software System for Vocal Rendering of Printed Documents

Directory of Open Access Journals (Sweden)

Marian DARDALA

2008-01-01

Full Text Available The objective of this paper is to present a software system architecture developed to render the printed documents in a vocal form. On the other hand, in the paper are described the software solutions that exist as software components and are necessary for documents processing as well as for multimedia device controlling used by the system. The usefulness of this system is for people with visual disabilities that can access the contents of documents without that they be printed in Braille system or to exist in an audio form.
Limited Documentation and Treatment Quality of Glycemic Inpatient Care in Relation to Structural Deficits of Heterogeneous Insulin Charts at a Large University Hospital.

Science.gov (United States)

Kopanz, Julia; Lichtenegger, Katharina M; Sendlhofer, Gerald; Semlitsch, Barbara; Cuder, Gerald; Pak, Andreas; Pieber, Thomas R; Tax, Christa; Brunner, Gernot; Plank, Johannes

2018-02-09

Insulin charts represent a key component in the inpatient glycemic management process. The aim was to evaluate the quality of structure, documentation, and treatment of diabetic inpatient care to design a new standardized insulin chart for a large university hospital setting. Historically grown blank insulin charts in use at 39 general wards were collected and evaluated for quality structure features. Documentation and treatment quality were evaluated in a consecutive snapshot audit of filled-in charts. The primary end point was the percentage of charts with any medication error. Overall, 20 different blank insulin charts with variable designs and significant structural deficits were identified. A medication error occurred in 55% of the 102 audited filled-in insulin charts, consisting of prescription and management errors in 48% and 16%, respectively. Charts of insulin-treated patients had more medication errors relative to patients treated with oral medication (P international standards, a new insulin chart was developed to overcome these quality hurdles.
Verifying the integrity of hardcopy document using OCR

CSIR Research Space (South Africa)

Mthethwa, Sthembile

2018-03-01

Full Text Available stream_source_info Mthethwa_20042_2018.pdf.txt stream_content_type text/plain stream_size 7349 Content-Encoding UTF-8 stream_name Mthethwa_20042_2018.pdf.txt Content-Type text/plain; charset=UTF-8 Verifying the Integrity...) of the document to be defined. Each text in the meta-template is labelled with a unique identifier, which makes it easier for the process of validation. The meta-template consist of two types of text; normal text and validation text (important text that must...
OFFICIAL DOCUMENTS RELATING TO PORTUGUESE LANGUAGE TEACHING, INTERCULTURALITY AND LITERACY POLICY

Directory of Open Access Journals (Sweden)

Cloris Porto Torquato

2016-06-01

Full Text Available The present article analyzes two documents Parâmetros Curriculares Nacionais – Língua Portuguesa (BRASIL, 1998 and Parâmetros Curriculares Nacionais – Temas Transversais – Pluralidade Cultural (BRASIL, 1998b, conceiving these documents as constituents of language policies (RICENTO, 2006; SHOHAMY, 2006 and literacy policies, and it focuses the intercultural dialogues/conflicts that these documents promote when guiding that the teaching of the language should have as main object the text and indicating which genres should be privileged. Thereby, the text deals with language policies, more specifically focusing in literacy policies (bringing to bear the concept of literacy formulated by the New Literacy Studies (STREET, 1984, 1993, 2003; BARTON; HAMILTON, 1998; SIGNORINI, 2001 and interculturality (JANZEN, 2005. The analysis of the documents is undertaken to the light of the bakhtinian conception of language and it mobilizes the following concepts of the Circle of Bakhtin: dialogism, utterance and genres of speech. Furthermore, this text is based methodologically on the orientations of the authors of this Circle for the study of the language (BAKHTIN/ VOLOSHINOV, 1986; BAKHTIN, 2003. The analysis indicates that the official documents, when promoting literacy policies, also promote intercultural conflicts, because they privilege the dominant literacies, silencing other literacy practices. We understood that this silencing and invalidating local literacy practices has implications for the constitutions of the students’ identities and local language policies.
Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs

Directory of Open Access Journals (Sweden)

Andrew J Reagan

2017-10-01

Full Text Available Abstract The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1 the dictionary covers a sufficiently large portion of a given text’s lexicon when weighted by word usage frequency; and (2 words are scored on a continuous scale.
Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic

Directory of Open Access Journals (Sweden)

Fawaz S. Al-Anzi

2017-04-01

Full Text Available Cosine similarity is one of the most popular distance measures in text classification problems. In this paper, we used this important measure to investigate the performance of Arabic language text classification. For textual features, vector space model (VSM is generally used as a model to represent textual information as numerical vectors. However, Latent Semantic Indexing (LSI is a better textual representation technique as it maintains semantic information between the words. Hence, we used the singular value decomposition (SVD method to extract textual features based on LSI. In our experiments, we conducted comparison between some of the well-known classification methods such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We used a corpus that contains 4,000 documents of ten topics (400 document for each topic. The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we used Term Frequency.Inverse Document Frequency (TF.IDF. This study reveals that the classification methods that use LSI features significantly outperform the TF.IDF-based methods. It also reveals that k-Nearest Neighbors (based on cosine measure and support vector machine are the best performing classifiers.
A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

OpenAIRE

Monika Raghuvanshi*, Rahul Patel

2016-01-01

In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...
Does pedagogical documentation support maternal reminiscing conversations?

Directory of Open Access Journals (Sweden)

Bethany Fleck

2015-12-01

Full Text Available When parents talk with their children about lessons learned in school, they are participating in reminiscing of an unshared event. This study sought to understand if pedagogical documentation, from the Reggio Approach to early childhood education, would support and enhance the conversation. Mother–child dyads reminisced two separate times about preschool lessons, one time with documentation available to them and one time without. Transcripts were coded extracting variables indicative of high and low maternal reminiscing styles. Results indicate that mother and child conversation characteristics were more highly elaborative when documentation was present than when it was not. In addition, children added more information to the conversation supporting the notion that such conversations enhanced memory for lessons. Documentation could be used as a support tool for conversations and children’s memory about lessons learned in school.
Evaluation of Hierarchical Clustering Algorithms for Document Datasets

National Research Council Canada - National Science Library

Zhao, Ying; Karypis, George

2002-01-01

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters...
Documentation of Accounting Records in Light of Legislative Innovations

Directory of Open Access Journals (Sweden)

K. V. BEZVERKHIY

2017-05-01

Full Text Available Legislative reforms in accounting aim to simplify accounting records and compilation of financial reports by business entities, thus increasing the position of Ukraine in the global ranking of Doing Business. This simplification is implied in the changes in the Regulation on Documentation of Accounting Records, entered into force to the Resolution of the Ukrainian Ministry of Finance. The objective of the study is to analyze the legislative innovations involved. The review of changes in documentation of accounting records is made. A comparative analysis of changes in the Regulation on Documentation of Accounting Records is made by sections: 1 General; 2 Primary documents; 3 Accounting records; 4 Correction of errors in primary documents and accounting records; 5 Organization of document circulation; 6 Storage of documents. Methods of analysis and synthesis are used for separating the differences in the editions of the Regulation on Documentation of Accounting Records. The result of the study has theoretical and practical value for the domestic business enterprise sector.
GPU-Accelerated Text Mining

International Nuclear Information System (INIS)

Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

2009-01-01

Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices
Wilmar joint market model, Documentation

International Nuclear Information System (INIS)

Meibom, P.; Larsen, Helge V.; Barth, R.; Brand, H.; Weber, C.; Voll, O.

2006-01-01

The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)
Wilmar joint market model, Documentation

Energy Technology Data Exchange (ETDEWEB)

Meibom, P.; Larsen, Helge V. [Risoe National Lab. (Denmark); Barth, R.; Brand, H. [IER, Univ. of Stuttgart (Germany); Weber, C.; Voll, O. [Univ. of Duisburg-Essen (Germany)

2006-01-15

The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)
Reconciling disparate information in continuity of care documents: Piloting a system to consolidate structured clinical documents.

Science.gov (United States)

Hosseini, Masoud; Jones, Josette; Faiola, Anthony; Vreeman, Daniel J; Wu, Huanmei; Dixon, Brian E

2017-10-01

Due to the nature of information generation in health care, clinical documents contain duplicate and sometimes conflicting information. Recent implementation of Health Information Exchange (HIE) mechanisms in which clinical summary documents are exchanged among disparate health care organizations can proliferate duplicate and conflicting information. To reduce information overload, a system to automatically consolidate information across multiple clinical summary documents was developed for an HIE network. The system receives any number of Continuity of Care Documents (CCDs) and outputs a single, consolidated record. To test the system, a randomly sampled corpus of 522 CCDs representing 50 unique patients was extracted from a large HIE network. The automated methods were compared to manual consolidation of information for three key sections of the CCD: problems, allergies, and medications. Manual consolidation of 11,631 entries was completed in approximately 150h. The same data were automatically consolidated in 3.3min. The system successfully consolidated 99.1% of problems, 87.0% of allergies, and 91.7% of medications. Almost all of the inaccuracies were caused by issues involving the use of standardized terminologies within the documents to represent individual information entries. This study represents a novel, tested tool for de-duplication and consolidation of CDA documents, which is a major step toward improving information access and the interoperability among information systems. While more work is necessary, automated systems like the one evaluated in this study will be necessary to meet the informatics needs of providers and health systems in the future. Copyright © 2017 Elsevier Inc. All rights reserved.

Rancang Bangun Perpustakaan Digital Berbasis Document Management System pada Fakultas Ilmu Komputer UNSIKA

Directory of Open Access Journals (Sweden)

Yayan Gustiana

2018-05-01

Full Text Available A book is a reading tool for everyone, it is very important for an institution to keep knowledge going and generate new knowledge. In the library of the Faculty of Computer Science Unsika, one of the obstacles that is the library space where the books are less adequate and difficult to find the title of research for final students because the title should not be the same. Development of a library with Document Management System based is a method in solving the problem, it is expected to provide solutions in achieving a document management. Can accommodate all books uploaded by users, given permissions and speed in the deployment to all users. Development of this modeling tool using the method. Which is in accordance with the wishes of customers. Database storage using MySQL, Object Oriented Programming (OOP programming language using the Code Igniter (CI framework. Based on the results of simple random sampling of 35 respondents 86.80% say agree to the digital library. the concept of document management system required large storage media, server and good local connectivity. This function can also be a stepping stone to do research.
Administration, control and updating of documentation, handbooks and plans, digitization, CAD application

International Nuclear Information System (INIS)

Rudolph, K.

1988-01-01

The large amount of documents which must be kept up-to-date during the construction and operation of a modern nuclear power plant call for computer supported document management. Possible solutions are discussed and visible trends analyzed. The employment of CAD is demonstrated by means of examples. Experience in the introduction of EDP document management is communicated. 4 figs
An Experimental Text in Transformational Geometry, Student Text; Cambridge Conference on School Mathematics Feasibility Study No. 43a.

Science.gov (United States)

Cambridge Conference on School Mathematics, Newton, MA.

This is part of a student text which was written with the aim of reflecting the thinking of The Cambridge Conference on School Mathematics (CCSM) regarding the goals and objectives for mathematics. The instructional materials were developed for teaching geometry in the secondary schools. This document is chapter six and titled Motions and…
Classification of forensic autopsy reports through conceptual graph-based document representation model.

Science.gov (United States)

Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

2018-06-01

Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results
COMPOSITIONAL AND SUBSTANTIAL STRUCTURE OF THE MEDICAL DOCUMENT: FORMATION STAGES

Directory of Open Access Journals (Sweden)

Romashova Olga Vladimirovna

2015-03-01

Full Text Available The article deals with the compositional and substantial structure of the ambulatory medical record, or "case history", which has being formed for a long time. The author allocates the three main periods in the formation of this medical document: the first period (the beginning of the 19th century – 1920s is connected with the origin and formation; the second period (1920-1980s is marked by emergence of the normative legal acts regulating registration and maintaining; the third period (1980s – up to the present is associated with the cancellation of regulations and the introduction of the new order of the Ministry of Health of the USSR that changed the document's form and name. It is determined that the composition of the case history consists of the title page and the main part. The following processes take place in the course of ambulatory medical record's formation: strengthening formalization, increase in the number of pattern text fragments, increase in the text's volume, and the implementation of bigger number of functions. The author reveals the main (informative and cumulative, accounting and additional (scientific, controlling, legal, financial functions of the document. The implementation of these functions is reflected in the compositional and substantial structure of the document text and is conditioned by a number of extralinguistic factors.
Development of digital library system on regulatory documents for nuclear power plants

International Nuclear Information System (INIS)

Lee, K. H.; Kim, K. J.; Yoon, Y. H.; Kim, M. W.; Lee, J. I.

2001-01-01

The main objective of this study is to establish nuclear regulatory document retrieval system based on internet. With the advancement of internet and information processing technology, information management patterns are going through a new paradigm. Getting along the current of the time, it is general tendency to transfer paper-type documents into electronic-type documents through document scanning and indexing. This system consists of nuclear regulatory documents, nuclear safety documents, digital library, and information system with index and full text
Using Electronic Systems for Document Management in Economic Entities

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available Document workflow and management, be them scanned documents, computer-generated e-documents or complex file formats, are critical elements for the success of an organization. Delivering the correct information to the right person, at the right moment is a fundamental element of daily activity. In the Internet era, documents have a new format; and what is more important: completely new functions. Paper is replaced by electronic formats such as .html, .xms, .pdf or .doc. The price for this progress is the increasing technological complexity, and with this complexity comes the need for more efficient techniques of management and organization such as a document management electronic system. This paper aims to present document management not as a separate software category on the IT market, but as an element integrated with any software solution, maximizing its capacity of making business more efficient.
Privacy Preserving Similarity Based Text Retrieval through Blind Storage

Directory of Open Access Journals (Sweden)

Pinki Kumari

2016-09-01

Full Text Available Cloud computing is improving rapidly due to their more advantage and more data owners give interest to outsource their data into cloud storage for centralize their data. As huge files stored in the cloud storage, there is need to implement the keyword based search process to data user. At the same time to protect the privacy of data, encryption techniques are used for sensitive data, that encryption is done before outsourcing data to cloud server. But it is critical to search results in encryption data. In this system we propose similarity text retrieval from the blind storage blocks with encryption format. This system provides more security because of blind storage system. In blind storage system data is stored randomly on cloud storage. In Existing Data Owner cannot encrypt the document data as it was done only at server end. Everyone can access the data as there was no private key concept applied to maintained privacy of the data. But In our proposed system, Data Owner can encrypt the data himself using RSA algorithm. RSA is a public key-cryptosystem and it is widely used for sensitive data storage over Internet. In our system we use Text mining process for identifying the index files of user documents. Before encryption we also use NLP (Nature Language Processing technique to identify the keyword synonyms of data owner document. Here text mining process examines text word by word and collect literal meaning beyond the words group that composes the sentence. Those words are examined in API of word net so that only equivalent words can be identified for index file use. Our proposed system provides more secure and authorized way of recover the text in cloud storage with access control. Finally, our experimental result shows that our system is better than existing.
A New Binarization Algorithm for Historical Documents

Directory of Open Access Journals (Sweden)

Marcos Almeida

2018-01-01

Full Text Available Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents a new binarization algorithm for historical documents. The new global filter proposed is performed in four steps: filtering the image using a bilateral filter, splitting image into the RGB components, decision-making for each RGB channel based on an adaptive binarization method inspired by Otsu’s method with a choice of the threshold level, and classification of the binarized images to decide which of the RGB components best preserved the document information in the foreground. The quantitative and qualitative assessment made with 23 binarization algorithms in three sets of “real world” documents showed very good results.
Literary Hermeneutic - A Large Vision upon the Text

Directory of Open Access Journals (Sweden)

Elena Vorotneac

2011-12-01

Full Text Available This article represents the book “Literary Hermeneutic” by Victoria Fonari, Ph.D., State University of Moldova. Hermeneutic, as a researching object, includes literary, critical, theological, juridical, linguistic, psychological, verbal and sociological knowledge. Literary Hermeneutic is one of the most favored disciplines. It is venerated both in Homeric exegesis from antiquity and in the improvement of the methodology interpretation of the canonical works, in which a vain moment is texts’ deciphering – the monuments and authors’ comment from times immemorial, thus re-establishing a part of human values. The re-establishing of the connections between the values of the past and their understanding from the present prospect is due to literary interpretation. The demands of the paradigm of the literary and artistic interpretation, constitutes a basic element which is important both for the writing of academic researches and for the literary values of understanding. It directs the student to scientific works and facilitated the professional activity of teachers, journalists, jurists and translators.
Writing Treatment for Aphasia: A Texting Approach

Science.gov (United States)

Beeson, Pelagie M.; Higginson, Kristina; Rising, Kindle

2013-01-01

Purpose: Treatment studies have documented the therapeutic and functional value of lexical writing treatment for individuals with severe aphasia. The purpose of this study was to determine whether such retraining could be accomplished using the typing feature of a cellular telephone, with the ultimate goal of using text messaging for…
Aspects of Text Mining From Computational Semiotics to Systemic Functional Hypertexts

Directory of Open Access Journals (Sweden)

Alexander Mehler

2001-05-01

Full Text Available The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts. In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system.
Dealing with extreme data diversity: extraction and fusion from the growing types of document formats

Science.gov (United States)

David, Peter; Hansen, Nichole; Nolan, James J.; Alcocer, Pedro

2015-05-01

The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.
MeSHmap: a text mining tool for MEDLINE.

OpenAIRE

Srinivasan, P.

2001-01-01

Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures et...
Measurement of the [Formula: see text] and [Formula: see text] production cross sections in multilepton final states using 3.2 fb[Formula: see text] of [Formula: see text] collisions at [Formula: see text] = 13 TeV with the ATLAS detector.

Science.gov (United States)

Aaboud, M; Aad, G; Abbott, B; Abdallah, J; Abdinov, O; Abeloos, B; Aben, R; AbouZeid, O S; Abraham, N L; Abramowicz, H; Abreu, H; Abreu, R; Abulaiti, Y; Acharya, B S; Adamczyk, L; Adams, D L; Adelman, J; Adomeit, S; Adye, T; Affolder, A A; Agatonovic-Jovin, T; Agricola, J; Aguilar-Saavedra, J A; Ahlen, S P; Ahmadov, F; Aielli, G; Akerstedt, H; Åkesson, T P A; Akimov, A V; Alberghi, G L; Albert, J; Albrand, S; Alconada Verzini, M J; Aleksa, M; Aleksandrov, I N; Alexa, C; Alexander, G; Alexopoulos, T; Alhroob, M; Ali, B; Aliev, M; Alimonti, G; Alison, J; Alkire, S P; Allbrooke, B M M; Allen, B W; Allport, P P; Aloisio, A; Alonso, A; Alonso, F; Alpigiani, C; Alstaty, M; Alvarez Gonzalez, B; Álvarez Piqueras, D; Alviggi, M G; Amadio, B T; Amako, K; Amaral Coutinho, Y; Amelung, C; Amidei, D; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amundsen, G; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, G; Anders, J K; Anderson, K J; Andreazza, A; Andrei, V; Angelidakis, S; Angelozzi, I; Anger, P; Angerami, A; Anghinolfi, F; Anisenkov, A V; Anjos, N; Annovi, A; Antel, C; Antonelli, M; Antonov, A; Anulli, F; Aoki, M; Aperio Bella, L; Arabidze, G; Arai, Y; Araque, J P; Arce, A T H; Arduh, F A; Arguin, J-F; Argyropoulos, S; Arik, M; Armbruster, A J; Armitage, L J; Arnaez, O; Arnold, H; Arratia, M; Arslan, O; Artamonov, A; Artoni, G; Artz, S; Asai, S; Asbah, N; Ashkenazi, A; Åsman, B; Asquith, L; Assamagan, K; Astalos, R; Atkinson, M; Atlay, N B; Augsten, K; Avolio, G; Axen, B; Ayoub, M K; Azuelos, G; Baak, M A; Baas, A E; Baca, M J; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Bagiacchi, P; Bagnaia, P; Bai, Y; Baines, J T; Baker, O K; Baldin, E M; Balek, P; Balestri, T; Balli, F; Balunas, W K; Banas, E; Banerjee, Sw; Bannoura, A A E; Barak, L; Barberio, E L; Barberis, D; Barbero, M; Barillari, T; Barklow, T; Barlow, N; Barnes, S L; Barnett, B M; Barnett, R M; Barnovska-Blenessy, Z; Baroncelli, A; Barone, G; Barr, A J; Barranco Navarro, L; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartos, P; Basalaev, A; Bassalat, A; Bates, R L; Batista, S J; Batley, J R; Battaglia, M; Bauce, M; Bauer, F; Bawa, H S; Beacham, J B; Beattie, M D; Beau, T; Beauchemin, P H; Bechtle, P; Beck, H P; Becker, K; Becker, M; Beckingham, M; Becot, C; Beddall, A J; Beddall, A; Bednyakov, V A; Bedognetti, M; Bee, C P; Beemster, L J; Beermann, T A; Begel, M; Behr, J K; Belanger-Champagne, C; Bell, A S; Bella, G; Bellagamba, L; Bellerive, A; Bellomo, M; Belotskiy, K; Beltramello, O; Belyaev, N L; Benary, O; Benchekroun, D; Bender, M; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez, J; Benjamin, D P; Bensinger, J R; Bentvelsen, S; Beresford, L; Beretta, M; Berge, D; Bergeaas Kuutmann, E; Berger, N; Beringer, J; Berlendis, S; Bernard, N R; Bernius, C; Bernlochner, F U; Berry, T; Berta, P; Bertella, C; Bertoli, G; Bertolucci, F; Bertram, I A; Bertsche, C; Bertsche, D; Besjes, G J; Bessidskaia Bylund, O; Bessner, M; Besson, N; Betancourt, C; Bethke, S; Bevan, A J; Bhimji, W; Bianchi, R M; Bianchini, L; Bianco, M; Biebel, O; Biedermann, D; Bielski, R; Biesuz, N V; Biglietti, M; De Mendizabal, J Bilbao; Bilokon, H; Bindi, M; Binet, S; Bingul, A; Bini, C; Biondi, S; Bjergaard, D M; Black, C W; Black, J E; Black, K M; Blackburn, D; Blair, R E; Blanchard, J-B; Blanco, J E; Blazek, T; Bloch, I; Blocker, C; Blum, W; Blumenschein, U; Blunier, S; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Bock, C; Boehler, M; Boerner, D; Bogaerts, J A; Bogavac, D; Bogdanchikov, A G; Bohm, C; Boisvert, V; Bokan, P; Bold, T; Boldyrev, A S; Bomben, M; Bona, M; Boonekamp, M; Borisov, A; Borissov, G; Bortfeldt, J; Bortoletto, D; Bortolotto, V; Bos, K; Boscherini, D; Bosman, M; Bossio Sola, J D; Boudreau, J; Bouffard, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Boutle, S K; Boveia, A; Boyd, J; Boyko, I R; Bracinik, J; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Braun, H M; Breaden Madden, W D; Brendlinger, K; Brennan, A J; Brenner, L; Brenner, R; Bressler, S; Bristow, T M; Britton, D; Britzger, D; Brochu, F M; Brock, I; Brock, R; Brooijmans, G; Brooks, T; Brooks, W K; Brosamer, J; Brost, E; Broughton, J H; de Renstrom, P A Bruckman; Bruncko, D; Bruneliere, R; Bruni, A; Bruni, G; Bruni, L S; Brunt, B H; Bruschi, M; Bruscino, N; Bryant, P; Bryngemark, L; Buanes, T; Buat, Q; Buchholz, P; Buckley, A G; Budagov, I A; Buehrer, F; Bugge, M K; Bulekov, O; Bullock, D; Burckhart, H; Burdin, S; Burgard, C D; Burghgrave, B; Burka, K; Burke, S; Burmeister, I; Burr, J T P; Busato, E; Büscher, D; Büscher, V; Bussey, P; Butler, J M; Buttar, C M; Butterworth, J M; Butti, P; Buttinger, W; Buzatu, A; Buzykaev, A R; Cabrera Urbán, S; Caforio, D; Cairo, V M; Cakir, O; Calace, N; Calafiura, P; Calandri, A; Calderini, G; Calfayan, P; Caloba, L P; Lopez, S Calvente; Calvet, D; Calvet, S; Calvet, T P; Toro, R Camacho; Camarda, S; Camarri, P; Cameron, D; Caminal Armadans, R; Camincher, C; Campana, S; Campanelli, M; Camplani, A; Campoverde, A; Canale, V; Canepa, A; Cano Bret, M; Cantero, J; Cantrill, R; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capua, M; Caputo, R; Carbone, R M; Cardarelli, R; Cardillo, F; Carli, I; Carli, T; Carlino, G; Carminati, L; Caron, S; Carquin, E; Carrillo-Montoya, G D; Carter, J R; Carvalho, J; Casadei, D; Casado, M P; Casolino, M; Casper, D W; Castaneda-Miranda, E; Castelijn, R; Castelli, A; Gimenez, V Castillo; Castro, N F; Catinaccio, A; Catmore, J R; Cattai, A; Caudron, J; Cavaliere, V; Cavallaro, E; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Ceradini, F; Cerda Alberich, L; Cerio, B C; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cerv, M; Cervelli, A; Cetin, S A; Chafaq, A; Chakraborty, D; Chan, S K; Chan, Y L; Chang, P; Chapman, J D; Charlton, D G; Chatterjee, A; Chau, C C; Chavez Barajas, C A; Che, S; Cheatham, S; Chegwidden, A; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, K; Chen, S; Chen, S; Chen, X; Chen, Y; Cheng, H C; Cheng, H J; Cheng, Y; Cheplakov, A; Cheremushkina, E; Moursli, R Cherkaoui El; Chernyatin, V; Cheu, E; Chevalier, L; Chiarella, V; Chiarelli, G; Chiodini, G; Chisholm, A S; Chitan, A; Chizhov, M V; Choi, K; Chomont, A R; Chouridou, S; Chow, B K B; Christodoulou, V; Chromek-Burckhart, D; Chudoba, J; Chuinard, A J; Chwastowski, J J; Chytka, L; Ciapetti, G; Ciftci, A K; Cinca, D; Cindro, V; Cioara, I A; Ciocca, C; Ciocio, A; Cirotto, F; Citron, Z H; Citterio, M; Ciubancan, M; Clark, A; Clark, B L; Clark, M R; Clark, P J; Clarke, R N; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Coffey, L; Colasurdo, L; Cole, B; Colijn, A P; Collot, J; Colombo, T; Compostella, G; Conde Muiño, P; Coniavitis, E; Connell, S H; Connelly, I A; Consorti, V; Constantinescu, S; Conti, G; Conventi, F; Cooke, M; Cooper, B D; Cooper-Sarkar, A M; Cormier, K J R; Cornelissen, T; Corradi, M; Corriveau, F; Corso-Radu, A; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Cottin, G; Cowan, G; Cox, B E; Cranmer, K; Crawley, S J; Cree, G; Crépé-Renaudin, S; Crescioli, F; Cribbs, W A; Crispin Ortuzar, M; Cristinziani, M; Croft, V; Crosetti, G; Cuhadar Donszelmann, T; Cummings, J; Curatolo, M; Cúth, J; Cuthbert, C; Czirr, H; Czodrowski, P; D'amen, G; D'Auria, S; D'Onofrio, M; De Sousa, M J Da Cunha Sargedas; Da Via, C; Dabrowski, W; Dado, T; Dai, T; Dale, O; Dallaire, F; Dallapiccola, C; Dam, M; Dandoy, J R; Dang, N P; Daniells, A C; Dann, N S; Danninger, M; Dano Hoffmann, M; Dao, V; Darbo, G; Darmora, S; Dassoulas, J; Dattagupta, A; Davey, W; David, C; Davidek, T; Davies, M; Davison, P; Dawe, E; Dawson, I; Daya-Ishmukhametova, R K; De, K; de Asmundis, R; De Benedetti, A; De Castro, S; De Cecco, S; De Groot, N; de Jong, P; De la Torre, H; De Lorenzi, F; De Maria, A; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Regie, J B De Vivie; Dearnaley, W J; Debbe, R; Debenedetti, C; Dedovich, D V; Dehghanian, N; Deigaard, I; Del Gaudio, M; Del Peso, J; Del Prete, T; Delgove, D; Deliot, F; Delitzsch, C M; Deliyergiyev, M; Dell'Acqua, A; Dell'Asta, L; Dell'Orso, M; Della Pietra, M; Della Volpe, D; Delmastro, M; Delsart, P A; DeMarco, D A; Demers, S; Demichev, M; Demilly, A; Denisov, S P; Denysiuk, D; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deterre, C; Dette, K; Deviveiros, P O; Dewhurst, A; Dhaliwal, S; Di Ciaccio, A; Di Ciaccio, L; Di Clemente, W K; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Micco, B; Di Nardo, R; Di Simone, A; Di Sipio, R; Di Valentino, D; Diaconu, C; Diamond, M; Dias, F A; Diaz, M A; Diehl, E B; Dietrich, J; Diglio, S; Dimitrievska, A; Dingfelder, J; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; Djuvsland, J I; do Vale, M A B; Dobos, D; Dobre, M; Doglioni, C; Dohmae, T; Dolejsi, J; Dolezal, Z; Dolgoshein, B A; Donadelli, M; Donati, S; Dondero, P; Donini, J; Dopke, J; Doria, A; Dova, M T; Doyle, A T; Drechsler, E; Dris, M; Du, Y; Duarte-Campderros, J; Duchovni, E; Duckeck, G; Ducu, O A; Duda, D; Dudarev, A; Duffield, E M; Duflot, L; Duguid, L; Dührssen, M; Dumancic, M; Dunford, M; Duran Yildiz, H; Düren, M; Durglishvili, A; Duschinger, D; Dutta, B; Dyndal, M; Eckardt, C; Ecker, K M; Edgar, R C; Edwards, N C; Eifert, T; Eigen, G; Einsweiler, K; Ekelof, T; El Kacimi, M; Ellajosyula, V; Ellert, M; Elles, S; Ellinghaus, F; Elliot, A A; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Endner, O C; Endo, M; Ennis, J S; Erdmann, J; Ereditato, A; Ernis, G; Ernst, J; Ernst, M; Errede, S; Ertel, E; Escalier, M; Esch, H; Escobar, C; Esposito, B; Etienvre, A I; Etzion, E; Evans, H; Ezhilov, A; Fabbri, F; Fabbri, L; Facini, G; Fakhrutdinov, R M; Falciano, S; Falla, R J; Faltova, J; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farina, C; Farina, E M; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Faucci Giannelli, M; Favareto, A; Fawcett, W J; Fayard, L; Fedin, O L; Fedorko, W; Feigl, S; Feligioni, L; Feng, C; Feng, E J; Feng, H; Fenyuk, A B; Feremenga, L; Fernandez Martinez, P; Fernandez Perez, S; Ferrando, J; Ferrari, A; Ferrari, P; Ferrari, R; de Lima, D E Ferreira; Ferrer, A; Ferrere, D; Ferretti, C; Ferretto Parodi, A; Fiedler, F; Filipčič, A; Filipuzzi, M; Filthaut, F; Fincke-Keeler, M; Finelli, K D; Fiolhais, M C N; Fiorini, L; Firan, A; Fischer, A; Fischer, C; Fischer, J; Fisher, W C; Flaschel, N; Fleck, I; Fleischmann, P; Fletcher, G T; Fletcher, R R M; Flick, T; Floderus, A; Flores Castillo, L R; Flowerdew, M J; Forcolin, G T; Formica, A; Forti, A; Foster, A G; Fournier, D; Fox, H; Fracchia, S; Francavilla, P; Franchini, M; Francis, D; Franconi, L; Franklin, M; Frate, M; Fraternali, M; Freeborn, D; Fressard-Batraneanu, S M; Friedrich, F; Froidevaux, D; Frost, J A; Fukunaga, C; Fullana Torregrosa, E; Fusayasu, T; Fuster, J; Gabaldon, C; Gabizon, O; Gabrielli, A; Gabrielli, A; Gach, G P; Gadatsch, S; Gadomski, S; Gagliardi, G; Gagnon, L G; Gagnon, P; Galea, C; Galhardo, B; Gallas, E J; Gallop, B J; Gallus, P; Galster, G; Gan, K K; Gao, J; Gao, Y; Gao, Y S; Garay Walls, F M; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gascon Bravo, A; Gatti, C; Gaudiello, A; Gaudio, G; Gaur, B; Gauthier, L; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Gecse, Z; Gee, C N P; Geich-Gimbel, Ch; Geisen, M; Geisler, M P; Gemme, C; Genest, M H; Geng, C; Gentile, S; George, S; Gerbaudo, D; Gershon, A; Ghasemi, S; Ghazlane, H; Ghneimat, M; Giacobbe, B; Giagu, S; Giannetti, P; Gibbard, B; Gibson, S M; Gignac, M; Gilchriese, M; Gillam, T P S; Gillberg, D; Gilles, G; Gingrich, D M; Giokaris, N; Giordani, M P; Giorgi, F M; Giorgi, F M; Giraud, P F; Giromini, P; Giugni, D; Giuli, F; Giuliani, C; Giulini, M; Gjelsten, B K; Gkaitatzis, S; Gkialas, I; Gkougkousis, E L; Gladilin, L K; Glasman, C; Glatzer, J; Glaysher, P C F; Glazov, A; Goblirsch-Kolb, M; Godlewski, J; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gonçalo, R; Costa, J Goncalves Pinto Firmino Da; Gonella, G; Gonella, L; Gongadze, A; de la Hoz, S González; Gonzalez Parra, G; Gonzalez-Sevilla, S; Goossens, L; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorini, B; Gorini, E; Gorišek, A; Gornicki, E; Goshaw, A T; Gössling, C; Gostkin, M I; Goudet, C R; Goujdami, D; Goussiou, A G; Govender, N; Gozani, E; Graber, L; Grabowska-Bold, I; Gradin, P O J; Grafström, P; Gramling, J; Gramstad, E; Grancagnolo, S; Gratchev, V; Gravila, P M; Gray, H M; Graziani, E; Greenwood, Z D; Grefe, C; Gregersen, K; Gregor, I M; Grenier, P; Grevtsov, K; Griffiths, J; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grivaz, J-F; Groh, S; Grohs, J P; Gross, E; Grosse-Knetter, J; Grossi, G C; Grout, Z J; Guan, L; Guan, W; Guenther, J; Guescini, F; Guest, D; Gueta, O; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gumpert, C; Guo, J; Guo, Y; Gupta, S; Gustavino, G; Gutierrez, P; Gutierrez Ortiz, N G; Gutschow, C; Guyot, C; Gwenlan, C; Gwilliam, C B; Haas, A; Haber, C; Hadavand, H K; Haddad, N; Hadef, A; Haefner, P; Hageböck, S; Hajduk, Z; Hakobyan, H; Haleem, M; Haley, J; Halladjian, G; Hallewell, G D; Hamacher, K; Hamal, P; Hamano, K; Hamilton, A; Hamity, G N; Hamnett, P G; Han, L; Hanagaki, K; Hanawa, K; Hance, M; Haney, B; Hanke, P; Hanna, R; Hansen, J B; Hansen, J D; Hansen, M C; Hansen, P H; Hara, K; Hard, A S; Harenberg, T; Hariri, F; Harkusha, S; Harrington, R D; Harrison, P F; Hartjes, F; Hartmann, N M; Hasegawa, M; Hasegawa, Y; Hasib, A; Hassani, S; Haug, S; Hauser, R; Hauswald, L; Havranek, M; Hawkes, C M; Hawkings, R J; Hayden, D; Hays, C P; Hays, J M; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heim, S; Heim, T; Heinemann, B; Heinrich, J J; Heinrich, L; Heinz, C; Hejbal, J; Helary, L; Hellman, S; Helsens, C; Henderson, J; Henderson, R C W; Heng, Y; Henkelmann, S; Henriques Correia, A M; Henrot-Versille, S; Herbert, G H; Hernández Jiménez, Y; Herten, G; Hertenberger, R; Hervas, L; Hesketh, G G; Hessey, N P; Hetherly, J W; Hickling, R; Higón-Rodriguez, E; Hill, E; Hill, J C; Hiller, K H; Hillier, S J; Hinchliffe, I; Hines, E; Hinman, R R; Hirose, M; Hirschbuehl, D; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoenig, F; Hohn, D; Holmes, T R; Homann, M; Hong, T M; Hooberman, B H; Hopkins, W H; Horii, Y; Horton, A J; Hostachy, J-Y; Hou, S; Hoummada, A; Howarth, J; Hrabovsky, M; Hristova, I; Hrivnac, J; Hryn'ova, T; Hrynevich, A; Hsu, C; Hsu, P J; Hsu, S-C; Hu, D; Hu, Q; Huang, Y; Hubacek, Z; Hubaut, F; Huegging, F; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Huo, P; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibragimov, I; Iconomidou-Fayard, L; Ideal, E; Idrissi, Z; Iengo, P; Igonkina, O; Iizawa, T; Ikegami, Y; Ikeno, M; Ilchenko, Y; Iliadis, D; Ilic, N; Ince, T; Introzzi, G; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Ishijima, N; Ishino, M; Ishitsuka, M; Ishmukhametov, R; Issever, C; Istin, S; Ito, F; Iturbe Ponce, J M; Iuppa, R; Iwanski, W; Iwasaki, H; Izen, J M; Izzo, V; Jabbar, S; Jackson, B; Jackson, M; Jackson, P; Jain, V; Jakobi, K B; Jakobs, K; Jakobsen, S; Jakoubek, T; Jamin, D O; Jana, D K; Jansen, E; Jansky, R; Janssen, J; Janus, M; Jarlskog, G; Javadov, N; Javůrek, T; Jeanneau, F; Jeanty, L; Jeng, G-Y; Jennens, D; Jenni, P; Jentzsch, J; Jeske, C; Jézéquel, S; Ji, H; Jia, J; Jiang, H; Jiang, Y; Jiggins, S; Jimenez Pena, J; Jin, S; Jinaru, A; Jinnouchi, O; Johansson, P; Johns, K A; Johnson, W J; Jon-And, K; Jones, G; Jones, R W L; Jones, S; Jones, T J; Jongmanns, J; Jorge, P M; Jovicevic, J; Ju, X; Juste Rozas, A; Köhler, M K; Kaczmarska, A; Kado, M; Kagan, H; Kagan, M; Kahn, S J; Kajomovitz, E; Kalderon, C W; Kaluza, A; Kama, S; Kamenshchikov, A; Kanaya, N; Kaneti, S; Kanjir, L; Kantserov, V A; Kanzaki, J; Kaplan, B; Kaplan, L S; Kapliy, A; Kar, D; Karakostas, K; Karamaoun, A; Karastathis, N; Kareem, M J; Karentzos, E; Karnevskiy, M; Karpov, S N; Karpova, Z M; Karthik, K; Kartvelishvili, V; Karyukhin, A N; Kasahara, K; Kashif, L; Kass, R D; Kastanas, A; Kataoka, Y; Kato, C; Katre, A; Katzy, J; Kawade, K; Kawagoe, K; Kawamoto, T; Kawamura, G; Kazama, S; Kazanin, V F; Keeler, R; Kehoe, R; Keller, J S; Kempster, J J; Keoshkerian, H; Kepka, O; Kerševan, B P; Kersten, S; Keyes, R A; Khader, M; Khalil-Zada, F; Khanov, A; Kharlamov, A G; Khoo, T J; Khovanskiy, V; Khramov, E; Khubua, J; Kido, S; Kim, H Y; Kim, S H; Kim, Y K; Kimura, N; Kind, O M; King, B T; King, M; King, S B; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kiss, F; Kiuchi, K; Kivernyk, O; Kladiva, E; Klein, M H; Klein, M; Klein, U; Kleinknecht, K; Klimek, P; Klimentov, A; Klingenberg, R; Klinger, J A; Klioutchnikova, T; Kluge, E-E; Kluit, P; Kluth, S; Knapik, J; Kneringer, E; Knoops, E B F G; Knue, A; Kobayashi, A; Kobayashi, D; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koffas, T; Koffeman, E; Koi, T; Kolanoski, H; Kolb, M; Koletsou, I; Komar, A A; Komori, Y; Kondo, T; Kondrashova, N; Köneke, K; König, A C; Kono, T; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Köpke, L; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A A; Korolkov, I; Korolkova, E V; Kortner, O; Kortner, S; Kosek, T; Kostyukhin, V V; Kotwal, A; Kourkoumeli-Charalampidi, A; Kourkoumelis, C; Kouskoura, V; Kowalewska, A B; Kowalewski, R; Kowalski, T Z; Kozakai, C; Kozanecki, W; Kozhin, A S; Kramarenko, V A; Kramberger, G; Krasnopevtsev, D; Krasny, M W; Krasznahorkay, A; Kraus, J K; Kravchenko, A; Kretz, M; Kretzschmar, J; Kreutzfeldt, K; Krieger, P; Krizka, K; Kroeninger, K; Kroha, H; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Krumnack, N; Kruse, A; Kruse, M C; Kruskal, M; Kubota, T; Kucuk, H; Kuday, S; Kuechler, J T; Kuehn, S; Kugel, A; Kuger, F; Kuhl, A; Kuhl, T; Kukhtin, V; Kukla, R; Kulchitsky, Y; Kuleshov, S; Kuna, M; Kunigo, T; Kupco, A; Kurashige, H; Kurochkin, Y A; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwan, T; Kyriazopoulos, D; La Rosa, A; La Rosa Navarro, J L; La Rotonda, L; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Lacuesta, V R; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Lammers, S; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lang, V S; Lange, J C; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Laplace, S; Lapoire, C; Laporte, J F; Lari, T; Lasagni Manghi, F; Lassnig, M; Laurelli, P; Lavrijsen, W; Law, A T; Laycock, P; Lazovich, T; Lazzaroni, M; Le, B; Le Dortz, O; Le Guirriec, E; Quilleuc, E P Le; LeBlanc, M; LeCompte, T; Ledroit-Guillon, F; Lee, C A; Lee, S C; Lee, L; Lefebvre, G; Lefebvre, M; Legger, F; Leggett, C; Lehan, A; Lehmann Miotto, G; Lei, X; Leight, W A; Leisos, A; Leister, A G; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Leney, K J C; Lenz, T; Lenzi, B; Leone, R; Leone, S; Leonidopoulos, C; Leontsinis, S; Lerner, G; Leroy, C; Lesage, A A J; Lester, C G; Levchenko, M; Levêque, J; Levin, D; Levinson, L J; Levy, M; Lewis, D; Leyko, A M; Leyton, M; Li, B; Li, H; Li, H L; Li, L; Li, L; Li, Q; Li, S; Li, X; Li, Y; Liang, Z; Liberti, B; Liblong, A; Lichard, P; Lie, K; Liebal, J; Liebig, W; Limosani, A; Lin, S C; Lin, T H; Lindquist, B E; Lionti, A E; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lister, A; Litke, A M; Liu, B; Liu, D; Liu, H; Liu, H; Liu, J; Liu, J B; Liu, K; Liu, L; Liu, M; Liu, M; Liu, Y L; Liu, Y; Livan, M; Lleres, A; Llorente Merino, J; Lloyd, S L; Lo Sterzo, F; Lobodzinska, E M; Loch, P; Lockman, W S; Loebinger, F K; Loevschall-Jensen, A E; Loew, K M; Loginov, A; Lohse, T; Lohwasser, K; Lokajicek, M; Long, B A; Long, J D; Long, R E; Longo, L; Looper, K A; Lopes, L; Lopez Mateos, D; Lopez Paredes, B; Lopez Paz, I; Lopez Solis, A; Lorenz, J; Lorenzo Martinez, N; Losada, M; Lösel, P J; Lou, X; Lounis, A; Love, J; Love, P A; Lu, H; Lu, N; Lubatti, H J; Luci, C; Lucotte, A; Luedtke, C; Luehring, F; Lukas, W; Luminari, L; Lundberg, O; Lund-Jensen, B; Luzi, P M; Lynn, D; Lysak, R; Lytken, E; Lyubushkin, V; Ma, H; Ma, L L; Ma, Y; Maccarrone, G; Macchiolo, A; Macdonald, C M; Maček, B; Machado Miguens, J; Madaffari, D; Madar, R; Maddocks, H J; Mader, W F; Madsen, A; Maeda, J; Maeland, S; Maeno, T; Maevskiy, A; Magradze, E; Mahlstedt, J; Maiani, C; Maidantchik, C; Maier, A A; Maier, T; Maio, A; Majewski, S; Makida, Y; Makovec, N; Malaescu, B; Malecki, Pa; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyukov, S; Mamuzic, J; Mancini, G; Mandelli, B; Mandelli, L; Mandić, I; Maneira, J; Filho, L Manhaes de Andrade; Manjarres Ramos, J; Mann, A; Manousos, A; Mansoulie, B; Mansour, J D; Mantifel, R; Mantoani, M; Manzoni, S; Mapelli, L; Marceca, G; March, L; Marchiori, G; Marcisovsky, M; Marjanovic, M; Marley, D E; Marroquim, F; Marsden, S P; Marshall, Z; Marti-Garcia, S; Martin, B; Martin, T A; Martin, V J; Latour, B Martin Dit; Martinez, M; Martinez Outschoorn, V I; Martin-Haugh, S; Martoiu, V S; Martyniuk, A C; Marx, M; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, I; Massa, L; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Mättig, P; Mattmann, J; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Mazza, S M; Mc Fadden, N C; Goldrick, G Mc; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McClymont, L I; McDonald, E F; McFarlane, K W; Mcfayden, J A; Mchedlidze, G; McMahon, S J; McPherson, R A; Medinnis, M; Meehan, S; Mehlhase, S; Mehta, A; Meier, K; Meineck, C; Meirose, B; Melini, D; Mellado Garcia, B R; Melo, M; Meloni, F; Mengarelli, A; Menke, S; Meoni, E; Mergelmeyer, S; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, C; Meyer, J-P; Meyer, J; Meyer Zu Theenhausen, H; Miano, F; Middleton, R P; Miglioranzi, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Milesi, M; Milic, A; Miller, D W; Mills, C; Milov, A; Milstead, D A; Minaenko, A A; Minami, Y; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Ming, Y; Mir, L M; Mistry, K P; Mitani, T; Mitrevski, J; Mitsou, V A; Miucci, A; Miyagawa, P S; Mjörnmark, J U; Moa, T; Mochizuki, K; Mohapatra, S; Molander, S; Moles-Valls, R; Monden, R; Mondragon, M C; Mönig, K; Monk, J; Monnier, E; Montalbano, A; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Morange, N; Moreno, D; Moreno Llácer, M; Morettini, P; Morgenstern, S; Mori, D; Mori, T; Morii, M; Morinaga, M; Morisbak, V; Moritz, S; Morley, A K; Mornacchi, G; Morris, J D; Mortensen, S S; Morvaj, L; Mosidze, M; Moss, J; Motohashi, K; Mount, R; Mountricha, E; Mouraviev, S V; Moyse, E J W; Muanza, S; Mudd, R D; Mueller, F; Mueller, J; Mueller, R S P; Mueller, T; Muenstermann, D; Mullen, P; Mullier, G A; Munoz Sanchez, F J; Murillo Quijada, J A; Murray, W J; Musheghyan, H; Muškinja, M; Myagkov, A G; Myska, M; Nachman, B P; Nackenhorst, O; Nagai, K; Nagai, R; Nagano, K; Nagasaka, Y; Nagata, K; Nagel, M; Nagy, E; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Namasivayam, H; Naranjo Garcia, R F; Narayan, R; Narrias Villar, D I; Naryshkin, I; Naumann, T; Navarro, G; Nayyar, R; Neal, H A; Nechaeva, P Yu; Neep, T J; Nef, P D; Negri, A; Negrini, M; Nektarijevic, S; Nellist, C; Nelson, A; Nemecek, S; Nemethy, P; Nepomuceno, A A; Nessi, M; Neubauer, M S; Neumann, M; Neves, R M; Nevski, P; Newman, P R; Nguyen, D H; Manh, T Nguyen; Nickerson, R B; Nicolaidou, R; Nielsen, J; Nikiforov, A; Nikolaenko, V; Nikolic-Audit, I; Nikolopoulos, K; Nilsen, J K; Nilsson, P; Ninomiya, Y; Nisati, A; Nisius, R; Nobe, T; Nodulman, L; Nomachi, M; Nomidis, I; Nooney, T; Norberg, S; Nordberg, M; Norjoharuddeen, N; Novgorodova, O; Nowak, S; Nozaki, M; Nozka, L; Ntekas, K; Nurse, E; Nuti, F; O'grady, F; O'Neil, D C; O'Rourke, A A; O'Shea, V; Oakham, F G; Oberlack, H; Obermann, T; Ocariz, J; Ochi, A; Ochoa, I; Ochoa-Ricoux, J P; Oda, S; Odaka, S; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohman, H; Oide, H; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Oleiro Seabra, L F; Olivares Pino, S A; Oliveira Damazio, D; Olszewski, A; Olszowska, J; Onofre, A; Onogi, K; Onyisi, P U E; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Orr, R S; Osculati, B; Ospanov, R; Garzon, G Otero Y; Otono, H; Ouchrif, M; Ould-Saada, F; Ouraou, A; Oussoren, K P; Ouyang, Q; Owen, M; Owen, R E; Ozcan, V E; Ozturk, N; Pachal, K; Pacheco Pages, A; Pacheco Rodriguez, L; Padilla Aranda, C; Pagáčová, M; Pagan Griso, S; Paige, F; Pais, P; Pajchel, K; Palacino, G; Palazzo, S; Palestini, S; Palka, M; Pallin, D; Palma, A; St Panagiotopoulou, E; Pandini, C E; Panduro Vazquez, J G; Pani, P; Panitkin, S; Pantea, D; Paolozzi, L; Papadopoulou, Th D; Papageorgiou, K; Paramonov, A; Paredes Hernandez, D; Parker, A J; Parker, M A; Parker, K A; Parodi, F; Parsons, J A; Parzefall, U; Pascuzzi, V R; Pasqualucci, E; Passaggio, S; Pastore, Fr; Pásztor, G; Pataraia, S; Pater, J R; Pauly, T; Pearce, J; Pearson, B; Pedersen, L E; Pedersen, M; Lopez, S Pedraza; Pedro, R; Peleganchuk, S V; Pelikan, D; Penc, O; Peng, C; Peng, H; Penwell, J; Peralva, B S; Perego, M M; Perepelitsa, D V; Perez Codina, E; Perini, L; Pernegger, H; Perrella, S; Peschke, R; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petroff, P; Petrolo, E; Petrov, M; Petrucci, F; Pettersson, N E; Peyaud, A; Pezoa, R; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Piccinini, M; Pickering, M A; Piegaia, R; Pilcher, J E; Pilkington, A D; Pin, A W J; Pinamonti, M; Pinfold, J L; Pingel, A; Pires, S; Pirumov, H; Pitt, M; Plazak, L; Pleier, M-A; Pleskot, V; Plotnikova, E; Plucinski, P; Pluth, D; Poettgen, R; Poggioli, L; Pohl, D; Polesello, G; Poley, A; Policicchio, A; Polifka, R; Polini, A; Pollard, C S; Polychronakos, V; Pommès, K; Pontecorvo, L; Pope, B G; Popeneciu, G A; Popovic, D S; Poppleton, A; Pospisil, S; Potamianos, K; Potrap, I N; Potter, C J; Potter, C T; Poulard, G; Poveda, J; Pozdnyakov, V; Pozo Astigarraga, M E; Pralavorio, P; Pranko, A; Prell, S; Price, D; Price, L E; Primavera, M; Prince, S; Proissl, M; Prokofiev, K; Prokoshin, F; Protopopescu, S; Proudfoot, J; Przybycien, M; Puddu, D; Purohit, M; Puzo, P; Qian, J; Qin, G; Qin, Y; Quadt, A; Quayle, W B; Queitsch-Maitland, M; Quilty, D; Raddum, S; Radeka, V; Radescu, V; Radhakrishnan, S K; Radloff, P; Rados, P; Ragusa, F; Rahal, G; Raine, J A; Rajagopalan, S; Rammensee, M; Rangel-Smith, C; Ratti, M G; Rauscher, F; Rave, S; Ravenscroft, T; Ravinovich, I; Raymond, M; Read, A L; Readioff, N P; Reale, M; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reeves, K; Rehnisch, L; Reichert, J; Reisin, H; Rembser, C; Ren, H; Rescigno, M; Resconi, S; Rezanova, O L; Reznicek, P; Rezvani, R; Richter, R; Richter, S; Richter-Was, E; Ricken, O; Ridel, M; Rieck, P; Riegel, C J; Rieger, J; Rifki, O; Rijssenbeek, M; Rimoldi, A; Rimoldi, M; Rinaldi, L; Ristić, B; Ritsch, E; Riu, I; Rizatdinova, F; Rizvi, E; Rizzi, C; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Roda, C; Rodina, Y; Rodriguez Perez, A; Rodriguez Rodriguez, D; Roe, S; Rogan, C S; Røhne, O; Romaniouk, A; Romano, M; Romano Saez, S M; Romero Adam, E; Rompotis, N; Ronzani, M; Roos, L; Ros, E; Rosati, S; Rosbach, K; Rose, P; Rosenthal, O; Rosien, N-A; Rossetti, V; Rossi, E; Rossi, L P; Rosten, J H N; Rosten, R; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Royon, C R; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rudolph, M S; Rühr, F; Ruiz-Martinez, A; Rurikova, Z; Rusakovich, N A; Ruschke, A; Russell, H L; Rutherfoord, J P; Ruthmann, N; Ryabov, Y F; Rybar, M; Rybkin, G; Ryu, S; Ryzhov, A; Rzehorz, G F; Saavedra, A F; Sabato, G; Sacerdoti, S; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Saha, P; Sahinsoy, M; Saimpert, M; Saito, T; Sakamoto, H; Sakurai, Y; Salamanna, G; Salamon, A; Loyola, J E Salazar; Salek, D; De Bruin, P H Sales; Salihagic, D; Salnikov, A; Salt, J; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sammel, D; Sampsonidis, D; Sanchez, A; Sánchez, J; Sanchez Martinez, V; Sandaker, H; Sandbach, R L; Sander, H G; Sandhoff, M; Sandoval, C; Sandstroem, R; Sankey, D P C; Sannino, M; Sansoni, A; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapp, K; Sapronov, A; Saraiva, J G; Sarrazin, B; Sasaki, O; Sasaki, Y; Sato, K; Sauvage, G; Sauvan, E; Savage, G; Savard, P; Sawyer, C; Sawyer, L; Saxon, J; Sbarra, C; Sbrizzi, A; Scanlon, T; Scannicchio, D A; Scarcella, M; Scarfone, V; Schaarschmidt, J; Schacht, P; Schachtner, B M; Schaefer, D; Schaefer, R; Schaeffer, J; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Schiavi, C; Schier, S; Schillo, C; Schioppa, M; Schlenker, S; Schmidt-Sommerfeld, K R; Schmieden, K; Schmitt, C; Schmitt, S; Schmitz, S; Schneider, B; Schnoor, U; Schoeffel, L; Schoening, A; Schoenrock, B D; Schopf, E; Schott, M; Schovancova, J; Schramm, S; Schreyer, M; Schuh, N; Schulte, A; Schultens, M J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwartzman, A; Schwarz, T A; Schwegler, Ph; Schweiger, H; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Schwindt, T; Sciolla, G; Scuri, F; Scutti, F; Searcy, J; Seema, P; Seidel, S C; Seiden, A; Seifert, F; Seixas, J M; Sekhniaidze, G; Sekhon, K; Sekula, S J; Seliverstov, D M; Semprini-Cesari, N; Serfon, C; Serin, L; Serkin, L; Sessa, M; Seuster, R; Severini, H; Sfiligoj, T; Sforza, F; Sfyrla, A; Shabalina, E; Shaikh, N W; Shan, L Y; Shang, R; Shank, J T; Shapiro, M; Shatalov, P B; Shaw, K; Shaw, S M; Shcherbakova, A; Shehu, C Y; Sherwood, P; Shi, L; Shimizu, S; Shimmin, C O; Shimojima, M; Shiyakova, M; Shmeleva, A; Shoaleh Saadi, D; Shochet, M J; Shojaii, S; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sickles, A M; Sidebo, P E; Sidiropoulou, O; Sidorov, D; Sidoti, A; Siegert, F; Sijacki, Dj; Silva, J; Silverstein, S B; Simak, V; Simard, O; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simon, D; Simon, M; Sinervo, P; Sinev, N B; Sioli, M; Siragusa, G; Sivoklokov, S Yu; Sjölin, J; Skinner, M B; Skottowe, H P; Skubic, P; Slater, M; Slavicek, T; Slawinska, M; Sliwa, K; Slovak, R; Smakhtin, V; Smart, B H; Smestad, L; Smiesko, J; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, M N K; Smith, R W; Smizanska, M; Smolek, K; Snesarev, A A; Snyder, S; Sobie, R; Socher, F; Soffer, A; Soh, D A; Sokhrannyi, G; Sanchez, C A Solans; Solar, M; Soldatov, E Yu; Soldevila, U; Solodkov, A A; Soloshenko, A; Solovyanov, O V; Solovyev, V; Sommer, P; Son, H; Song, H Y; Sood, A; Sopczak, A; Sopko, V; Sorin, V; Sosa, D; Sotiropoulou, C L; Soualah, R; Soukharev, A M; South, D; Sowden, B C; Spagnolo, S; Spalla, M; Spangenberg, M; Spanò, F; Sperlich, D; Spettel, F; Spighi, R; Spigo, G; Spiller, L A; Spousta, M; Denis, R D St; Stabile, A; Stamen, R; Stamm, S; Stanecka, E; Stanek, R W; Stanescu, C; Stanescu-Bellu, M; Stanitzki, M M; Stapnes, S; Starchenko, E A; Stark, G H; Stark, J; Staroba, P; Starovoitov, P; Stärz, S; Staszewski, R; Steinberg, P; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stewart, G A; Stillings, J A; Stockton, M C; Stoebe, M; Stoicea, G; Stolte, P; Stonjek, S; Stradling, A R; Straessner, A; Stramaglia, M E; Strandberg, J; Strandberg, S; Strandlie, A; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Stroynowski, R; Strubig, A; Stucci, S A; Stugu, B; Styles, N A; Su, D; Su, J; Subramaniam, R; Suchek, S; Sugaya, Y; Suk, M; Sulin, V V; Sultansoy, S; Sumida, T; Sun, S; Sun, X; Sundermann, J E; Suruliz, K; Susinno, G; Sutton, M R; Suzuki, S; Svatos, M; Swiatlowski, M; Sykora, I; Sykora, T; Ta, D; Taccini, C; Tackmann, K; Taenzer, J; Taffard, A; Tafirout, R; Taiblum, N; Takai, H; Takashima, R; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A A; Tan, K G; Tanaka, J; Tanaka, R; Tanaka, S; Tannenwald, B B; Araya, S Tapia; Tapprogge, S; Tarem, S; Tartarelli, G F; Tas, P; Tasevsky, M; Tashiro, T; Tassi, E; Tavares Delgado, A; Tayalati, Y; Taylor, A C; Taylor, G N; Taylor, P T E; Taylor, W; Teischinger, F A; Teixeira-Dias, P; Temming, K K; Temple, D; Ten Kate, H; Teng, P K; Teoh, J J; Tepel, F; Terada, S; Terashi, K; Terron, J; Terzo, S; Testa, M; Teuscher, R J; Theveneaux-Pelzer, T; Thomas, J P; Thomas-Wilsker, J; Thompson, E N; Thompson, P D; Thompson, A S; Thomsen, L A; Thomson, E; Thomson, M; Tibbetts, M J; Ticse Torres, R E; Tikhomirov, V O; Tikhonov, Yu A; Timoshenko, S; Tipton, P; Tisserant, S; Todome, K; Todorov, T; Todorova-Nova, S; Tojo, J; Tokár, S; Tokushuku, K; Tolley, E; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Tong, B; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Trefzger, T; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Trischuk, W; Trocmé, B; Trofymov, A; Troncon, C; Trottier-McDonald, M; Trovatelli, M; Truong, L; Trzebinski, M; Trzupek, A; Tseng, J C-L; Tsiareshka, P V; Tsipolitis, G; Tsirintanis, N; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsui, K M; Tsukerman, I I; Tsulaia, V; Tsuno, S; Tsybychev, D; Tudorache, A; Tudorache, V; Tuna, A N; Tupputi, S A; Turchikhin, S; Turecek, D; Turgeman, D; Turra, R; Turvey, A J; Tuts, P M; Tyndel, M; Ucchielli, G; Ueda, I; Ughetto, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Unverdorben, C; Urban, J; Urquijo, P; Urrejola, P; Usai, G; Usanova, A; Vacavant, L; Vacek, V; Vachon, B; Valderanis, C; Valdes Santurio, E; Valencic, N; Valentinetti, S; Valero, A; Valery, L; Valkar, S; Vallecorsa, S; Valls Ferrer, J A; Van Den Wollenberg, W; Van Der Deijl, P C; van der Geer, R; van der Graaf, H; van Eldik, N; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; van Woerden, M C; Vanadia, M; Vandelli, W; Vanguri, R; Vaniachine, A; Vankov, P; Vardanyan, G; Vari, R; Varnes, E W; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vasquez, J G; Vazeille, F; Vazquez Schroeder, T; Veatch, J; Veloce, L M; Veloso, F; Veneziano, S; Ventura, A; Venturi, M; Venturi, N; Venturini, A; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, J C; Vest, A; Vetterli, M C; Viazlo, O; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Vigani, L; Vigne, R; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinogradov, V B; Vittori, C; Vivarelli, I; Vlachos, S; Vlasak, M; Vogel, M; Vokac, P; Volpi, G; Volpi, M; von der Schmitt, H; von Toerne, E; Vorobel, V; Vorobev, K; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Vranjes Milosavljevic, M; Vrba, V; Vreeswijk, M; Vuillermet, R; Vukotic, I; Vykydal, Z; Wagner, P; Wagner, W; Wahlberg, H; Wahrmund, S; Wakabayashi, J; Walder, J; Walker, R; Walkowiak, W; Wallangen, V; Wang, C; Wang, C; Wang, F; Wang, H; Wang, H; Wang, J; Wang, J; Wang, K; Wang, R; Wang, S M; Wang, T; Wang, T; Wang, W; Wang, X; Wanotayaroj, C; Warburton, A; Ward, C P; Wardrope, D R; Washbrook, A; Watkins, P M; Watson, A T; Watson, M F; Watts, G; Watts, S; Waugh, B M; Webb, S; Weber, M S; Weber, S W; Webster, J S; Weidberg, A R; Weinert, B; Weingarten, J; Weiser, C; Weits, H; Wells, P S; Wenaus, T; Wengler, T; Wenig, S; Wermes, N; Werner, M; Werner, M D; Werner, P; Wessels, M; Wetter, J; Whalen, K; Whallon, N L; Wharton, A M; White, A; White, M J; White, R; Whiteson, D; Wickens, F J; Wiedenmann, W; Wielers, M; Wienemann, P; Wiglesworth, C; Wiik-Fuchs, L A M; Wildauer, A; Wilk, F; Wilkens, H G; Williams, H H; Williams, S; Willis, C; Willocq, S; Wilson, J A; Wingerter-Seez, I; Winklmeier, F; Winston, O J; Winter, B T; Wittgen, M; Wittkowski, J; Wolter, M W; Wolters, H; Worm, S D; Wosiek, B K; Wotschack, J; Woudstra, M J; Wozniak, K W; Wu, M; Wu, M; Wu, S L; Wu, X; Wu, Y; Wyatt, T R; Wynne, B M; Xella, S; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yakabe, R; Yamaguchi, D; Yamaguchi, Y; Yamamoto, A; Yamamoto, S; Yamanaka, T; Yamauchi, K; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, Y; Yang, Z; Yao, W-M; Yap, Y C; Yasu, Y; Yatsenko, E; Wong, K H Yau; Ye, J; Ye, S; Yeletskikh, I; Yen, A L; Yildirim, E; Yorita, K; Yoshida, R; Yoshihara, K; Young, C; Young, C J S; Youssef, S; Yu, D R; Yu, J; Yu, J M; Yu, J; Yuan, L; Yuen, S P Y; Yusuff, I; Zabinski, B; Zaidan, R; Zaitsev, A M; Zakharchuk, N; Zalieckas, J; Zaman, A; Zambito, S; Zanello, L; Zanzi, D; Zeitnitz, C; Zeman, M; Zemla, A; Zeng, J C; Zeng, Q; Zengel, K; Zenin, O; Ženiš, T; Zerwas, D; Zhang, D; Zhang, F; Zhang, G; Zhang, H; Zhang, J; Zhang, L; Zhang, R; Zhang, R; Zhang, X; Zhang, Z; Zhao, X; Zhao, Y; Zhao, Z; Zhemchugov, A; Zhong, J; Zhou, B; Zhou, C; Zhou, L; Zhou, L; Zhou, M; Zhou, N; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhukov, K; Zibell, A; Zieminska, D; Zimine, N I; Zimmermann, C; Zimmermann, S; Zinonos, Z; Zinser, M; Ziolkowski, M; Živković, L; Zobernig, G; Zoccoli, A; Zur Nedden, M; Zwalinski, L

2017-01-01

A measurement of the [Formula: see text] and [Formula: see text] production cross sections in final states with either two same-charge muons, or three or four leptons (electrons or muons) is presented. The analysis uses a data sample of proton-proton collisions at [Formula: see text] TeV recorded with the ATLAS detector at the Large Hadron Collider in 2015, corresponding to a total integrated luminosity of 3.2 fb[Formula: see text]. The inclusive cross sections are extracted using likelihood fits to signal and control regions, resulting in [Formula: see text] pb and [Formula: see text] pb, in agreement with the Standard Model predictions.
Death as Insight into Life: Adolescents' Gothic Text Encounters

Science.gov (United States)

Del Nero, Jennifer

2017-01-01

This qualitative case study explores adolescents' responses to texts containing death and destruction, a seminal trope of the Gothic literary genre. Participants read both classic and popular culture texts featuring characters grappling with death in their seventh grade reading classroom. Observations, interviews, and documents were collected and…
Rational kernels for Arabic Root Extraction and Text Classification

Directory of Open Access Journals (Sweden)

Attia Nehar

2016-04-01

Full Text Available In this paper, we address the problems of Arabic Text Classification and root extraction using transducers and rational kernels. We introduce a new root extraction approach on the basis of the use of Arabic patterns (Pattern Based Stemmer. Transducers are used to model these patterns and root extraction is done without relying on any dictionary. Using transducers for extracting roots, documents are transformed into finite state transducers. This document representation allows us to use and explore rational kernels as a framework for Arabic Text Classification. Root extraction experiments are conducted on three word collections and yield 75.6% of accuracy. Classification experiments are done on the Saudi Press Agency dataset and N-gram kernels are tested with different values of N. Accuracy and F1 report 90.79% and 62.93% respectively. These results show that our approach, when compared with other approaches, is promising specially in terms of accuracy and F1.
DOCUMENT REPRESENTATION FOR CLUSTERING OF SCIENTIFIC ABSTRACTS

Directory of Open Access Journals (Sweden)

S. V. Popova

2014-01-01

Full Text Available The key issue of the present paper is clustering of narrow-domain short texts, such as scientific abstracts. The work is based on the observations made when improving the performance of key phrase extraction algorithm. An extended stop-words list was used that was built automatically for the purposes of key phrase extraction and gave the possibility for a considerable quality enhancement of the phrases extracted from scientific publications. A description of the stop- words list creation procedure is given. The main objective is to investigate the possibilities to increase the performance and/or speed of clustering by the above-mentioned list of stop-words as well as information about lexeme parts of speech. In the latter case a vocabulary is applied for the document representation, which contains not all the words that occurred in the collection, but only nouns and adjectives or their sequences encountered in the documents. Two base clustering algorithms are applied: k-means and hierarchical clustering (average agglomerative method. The results show that the use of an extended stop-words list and adjective-noun document representation makes it possible to improve the performance and speed of k-means clustering. In a similar case for average agglomerative method a decline in performance quality may be observed. It is shown that the use of adjective-noun sequences for document representation lowers the clustering quality for both algorithms and can be justified only when a considerable reduction of feature space dimensionality is necessary.
Linguistic Dating of Biblical Texts

DEFF Research Database (Denmark)

Ehrensvärd, Martin Gustaf

2003-01-01

For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed the chronol......For two centuries, scholars have pointed to consistent differences in the Hebrew of certain biblical texts and interpreted these differences as reflecting the date of composition of the texts. Until the 1980s, this was quite uncontroversial as the linguistic findings largely confirmed...... the chronology of the texts established by other means: the Hebrew of Genesis-2 Kings was judged to be early and that of Esther, Daniel, Ezra, Nehemiah, and Chronicles to be late. In the current debate where revisionists have questioned the traditional dating, linguistic arguments in the dating of texts have...... come more into focus. The study critically examines some linguistic arguments adduced to support the traditional position, and reviewing the arguments it points to weaknesses in the linguistic dating of EBH texts to pre-exilic times. When viewing the linguistic evidence in isolation it will be clear...
DOES PRESENTING PATIENT'S BMI INCREASE DOCUMENTATION OF OBESITY?

Directory of Open Access Journals (Sweden)

Norm Clothier, MD, M. Kim Marvel, PhD, Courtney S. Cruickshank, MS

2002-09-01

Full Text Available Purpose: Despite the associated health consequences, obesity is infrequently documented as a problem in medical charts. The purpose of this study is to determine whether a simple intervention (routine listing of the BMI on the medical chart will increase physician documentation of obesity in the medical record. Methods: Participants were resident physicians in a family medicine residency program. Participants were randomly assigned to either an experimental group or a control group. For experimental group physicians, the Body Mass Index was listed alongside other vital signs of patients seen in an ambulatory setting. Physician documentation of patient obesity was assessed by chart review after patient visits. Documentation was defined as inclusion of obesity on the problem list or in the progress note. Results: The intervention did not significantly increase the rate of documentation of obesity in the medical chart. Several reasons for the lack of change are explored, including the difficulty of treating obesity successfully.

Information Retrieval and Text Mining Technologies for Chemistry.

Science.gov (United States)

Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

2017-06-28

Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
A Proposed Arabic Handwritten Text Normalization Method

Directory of Open Access Journals (Sweden)

Tarik Abu-Ain

2014-11-01

Full Text Available Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line. The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.
Electronic Braille Document Reader

OpenAIRE

Arif, Shahab; Holmes, Violeta

2013-01-01

This paper presents an investigation into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blin...
Electronic Braille Document Reader

OpenAIRE

Arif, S.

2012-01-01

An investigation was conducted into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blind and ...
Interconnectedness und digitale Texte

Directory of Open Access Journals (Sweden)

Detlev Doherr

2013-04-01

Full Text Available Zusammenfassung Die multimedialen Informationsdienste im Internet werden immer umfangreicher und umfassender, wobei auch die nur in gedruckter Form vorliegenden Dokumente von den Bibliotheken digitalisiert und ins Netz gestellt werden. Über Online-Dokumentenverwaltungen oder Suchmaschinen können diese Dokumente gefunden und dann in gängigen Formaten wie z.B. PDF bereitgestellt werden. Dieser Artikel beleuchtet die Funktionsweise der Humboldt Digital Library, die seit mehr als zehn Jahren Dokumente von Alexander von Humboldt in englischer Übersetzung im Web als HDL (Humboldt Digital Library kostenfrei zur Verfügung stellt. Anders als eine digitale Bibliothek werden dabei allerdings nicht nur digitalisierte Dokumente als Scan oder PDF bereitgestellt, sondern der Text als solcher und in vernetzter Form verfügbar gemacht. Das System gleicht damit eher einem Informationssystem als einer digitalen Bibliothek, was sich auch in den verfügbaren Funktionen zur Auffindung von Texten in unterschiedlichen Versionen und Übersetzungen, Vergleichen von Absätzen verschiedener Dokumente oder der Darstellung von Bilden in ihrem Kontext widerspiegelt. Die Entwicklung von dynamischen Hyperlinks auf der Basis der einzelnen Textabsätze der Humboldt‘schen Werke in Form von Media Assets ermöglicht eine Nutzung der Programmierschnittstelle von Google Maps zur geographischen wie auch textinhaltlichen Navigation. Über den Service einer digitalen Bibliothek hinausgehend, bietet die HDL den Prototypen eines mehrdimensionalen Informationssystems, das mit dynamischen Strukturen arbeitet und umfangreiche thematische Auswertungen und Vergleiche ermöglicht. Summary The multimedia information services on Internet are becoming more and more comprehensive, even the printed documents are digitized and republished as digital Web documents by the libraries. Those digital files can be found by search engines or management tools and provided as files in usual formats as
Technical document characterization by data analysis

International Nuclear Information System (INIS)

Mauget, A.

1993-05-01

Nuclear power plants possess documents analyzing all the plant systems, which represents a vast quantity of paper. Analysis of textual data can enable a document to be classified by grouping the texts containing the same words. These methods are used on system manuals for feasibility studies. The system manual is then analyzed by LEXTER and the terms it has selected are examined. We first classify according to style (sentences containing general words, technical sentences, etc.), and then according to terms. However, it will not be possible to continue in this fashion for the 100 system manuals existing, because of lack of sufficient storage capacity. Another solution is being developed. (author)
Biblio-MetReS for user-friendly mining of genes and biological processes in scientific documents.

Science.gov (United States)

Usie, Anabel; Karathia, Hiren; Teixidó, Ivan; Alves, Rui; Solsona, Francesc

2014-01-01

One way to initiate the reconstruction of molecular circuits is by using automated text-mining techniques. Developing more efficient methods for such reconstruction is a topic of active research, and those methods are typically included by bioinformaticians in pipelines used to mine and curate large literature datasets. Nevertheless, experimental biologists have a limited number of available user-friendly tools that use text-mining for network reconstruction and require no programming skills to use. One of these tools is Biblio-MetReS. Originally, this tool permitted an on-the-fly analysis of documents contained in a number of web-based literature databases to identify co-occurrence of proteins/genes. This approach ensured results that were always up-to-date with the latest live version of the databases. However, this 'up-to-dateness' came at the cost of large execution times. Here we report an evolution of the application Biblio-MetReS that permits constructing co-occurrence networks for genes, GO processes, Pathways, or any combination of the three types of entities and graphically represent those entities. We show that the performance of Biblio-MetReS in identifying gene co-occurrence is as least as good as that of other comparable applications (STRING and iHOP). In addition, we also show that the identification of GO processes is on par to that reported in the latest BioCreAtIvE challenge. Finally, we also report the implementation of a new strategy that combines on-the-fly analysis of new documents with preprocessed information from documents that were encountered in previous analyses. This combination simultaneously decreases program run time and maintains 'up-to-dateness' of the results. http://metres.udl.cat/index.php/downloads, metres.cmb@gmail.com.
Guidelines for the documentation of digital computer programs - approved 1974

International Nuclear Information System (INIS)

Anon.

1975-01-01

This standard presents guidelines for the documentation of engineering and scientific computer programs. Good documentation promotes understanding, reduces duplication of effort, eases conversion to different computer environments and aids modification for extended applications. Good documentation is essential for implementation and effective use of programs obtained from other installations. Since the intention of this standard is to encourage better communication between the developer and user, it should be regarded as a guide rather than a set of rigid specifications. As a guide, it is sufficiently comprehensive to apply to large-scale programs intended for extensive external use. Not all features of this document are appropriate in all circumstances. In general, as the project complexity increases so does the need for more complete documentation. An organization may have special documentation requirements which supersede or extend these guidelines. This standard is a revision of ANS-STD.2-1967 and supersedes it
From paper to digital documents : Challenging and improving the SGML approach

OpenAIRE

Sandahl, Tone Irene

1999-01-01

This research has been initiated on the basis of practical experiences in developing a relatively large SGML system at the University of Oslo. This thesis contributes to the field of information systems, with a particular focus on document systems. The aim of this work is to inform the design of document systems by considering the transformation from paper to digital documents in organizations. The Standard Generalized Markup Language (SGML, ISO 8879) approach is emphasized. The SGML approach...
Friction and Lubrication of Large Tilting-Pad Thrust Bearings

Directory of Open Access Journals (Sweden)

Michał Wasilczuk

2015-04-01

Full Text Available Fluid film bearings have been extensively used in the industry because of their unbeatable durability and extremely low friction coefficient, despite a very low coefficient of friction dissipation of energy being noticeable, especially in large bearings. Lubricating systems of large tilting pad thrust bearings utilized in large, vertical shaft hydrogenerators are presented in this paper. A large amount of heat is generated due to viscous shearing of the lubricant large tilting pad thrust bearings, and this requires systems for forced cooling of the lubricant. In the dominant bath lubrication systems, cooling is realized by internal coolers or external cooling systems, with the latter showing some important advantages at the cost of complexity and also, potentially, lower reliability. Substantial losses in the bearings, reaching 1 MW in extreme cases, are a good motivation for the research and development aimed at reducing them. Some possible methods and their potential efficiency, along with some effects already documented, are also described in the paper.
Understanding Clinician Information Demands and Synthesis of Clinical Documents in Electronic Health Record Systems

Science.gov (United States)

Farri, Oladimeji Feyisetan

2012-01-01

Large quantities of redundant clinical data are usually transferred from one clinical document to another, making the review of such documents cognitively burdensome and potentially error-prone. Inadequate designs of electronic health record (EHR) clinical document user interfaces probably contribute to the difficulties clinicians experience while…
New public dataset for spotting patterns in medieval document images

Science.gov (United States)

En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

2017-01-01

With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
Learning High-Order Filters for Efficient Blind Deconvolution of Document Photographs

KAUST Repository

Xiao, Lei

2016-09-16

Photographs of text documents taken by hand-held cameras can be easily degraded by camera motion during exposure. In this paper, we propose a new method for blind deconvolution of document images. Observing that document images are usually dominated by small-scale high-order structures, we propose to learn a multi-scale, interleaved cascade of shrinkage fields model, which contains a series of high-order filters to facilitate joint recovery of blur kernel and latent image. With extensive experiments, we show that our method produces high quality results and is highly efficient at the same time, making it a practical choice for deblurring high resolution text images captured by modern mobile devices. © Springer International Publishing AG 2016.
Conservation Documentation and the Implications of Digitisation

Directory of Open Access Journals (Sweden)

Michelle Moore

2001-11-01

Full Text Available Conservation documentation can be defined as the textual and visual records collected during the care and treatment of an object. It can include records of the object's condition, any treatment done to the object, any observations or conclusions made by the conservator as well as details on the object's past and present environment. The form of documentation is not universally agreed upon nor has it always been considered an important aspect of the conservation profession. Good documentation tells the complete story of an object thus far and should provide as much information as possible for the future researcher, curator, or conservator. The conservation profession will benefit from digitising its documentation using software such as databases and hardware like digital cameras and scanners. Digital technology will make conservation documentation more easily accessible, cost/time efficient, and will increase consistency and accuracy of the recorded data, and reduce physical storage space requirements. The major drawback to digitising conservation records is maintaining access to the information for the future; the notorious pace of technological change has serious implications for retrieving data from any machine- readable medium.
Binarization and Segmentation Framework for Sundanese Ancient Documents

Directory of Open Access Journals (Sweden)

Erick Paulus

2017-11-01

Full Text Available Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.
Intelligent Bar Chart Plagiarism Detection in Documents

Directory of Open Access Journals (Sweden)

Mohammed Mumtaz Al-Dabbagh

2014-01-01

Full Text Available This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR. By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.
Automatic classification of journalistic documents on the Internet1

Directory of Open Access Journals (Sweden)

Elias OLIVEIRA

Full Text Available Abstract Online journalism is increasing every day. There are many news agencies, newspapers, and magazines using digital publication in the global network. Documents published online are available to users, who use search engines to find them. In order to deliver documents that are relevant to the search, they must be indexed and classified. Due to the vast number of documents published online every day, a lot of research has been carried out to find ways to facilitate automatic document classification. The objective of the present study is to describe an experimental approach for the automatic classification of journalistic documents published on the Internet using the Vector Space Model for document representation. The model was tested based on a real journalism database, using algorithms that have been widely reported in the literature. This article also describes the metrics used to assess the performance of these algorithms and their required configurations. The results obtained show the efficiency of the method used and justify further research to find ways to facilitate the automatic classification of documents.
Health physics source document for codes of practice

International Nuclear Information System (INIS)

Pearson, G.W.; Meggitt, G.C.

1989-05-01

Personnel preparing codes of practice often require basic Health Physics information or advice relating to radiological protection problems and this document is written primarily to supply such information. Certain technical terms used in the text are explained in the extensive glossary. Due to the pace of change in the field of radiological protection it is difficult to produce an up-to-date document. This document was compiled during 1988 however, and therefore contains the principle changes brought about by the introduction of the Ionising Radiations Regulations (1985). The paper covers the nature of ionising radiation, its biological effects and the principles of control. It is hoped that the document will provide a useful source of information for both codes of practice and wider areas and stimulate readers to study radiological protection issues in greater depth. (author)
ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

Directory of Open Access Journals (Sweden)

SAID BAHASSINE

2017-06-01

Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.
Text mining with R a tidy approach

CERN Document Server

Silge, Julia

2017-01-01

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...

A Digital Humanities Approach to the History of Science Eugenics Revisited in Hidden Debates by Means of Semantic Text Mining

OpenAIRE

Huijnen, Pim; Laan, Fons; de Rijke, Maarten; Pieters, Toine

2014-01-01

Comparative historical research on the the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two succ...
It’s about This and That: A Description of Anaphoric Expressions in Clinical Text

Science.gov (United States)

Wang, Yan; Melton, Genevieve B.; Pakhomov, Serguei

2011-01-01

Although anaphoric expressions are very common in biomedical and clinical documents, little work has been done to systematically characterize their use in clinical text. Samples of ‘it’, ‘this’, and ‘that’ expressions occurring in inpatient clinical notes from four metropolitan hospitals were analyzed using a combination of semi-automated and manual annotation techniques. We developed a rule-based approach to filter potential non-referential expressions. A physician then manually annotated 1000 potential referential instances to determine referent status and the antecedent of each referent expression. A distributional analysis of the three referring expressions in the entire corpus of notes demonstrates a high prevalence of anaphora and large variance in distributions of referential expressions with different notes. Our results confirm that anaphoric expressions are common in clinical texts. Effective co-reference resolution with anaphoric expressions remains an important challenge in medical natural language processing research. PMID:22195211
Light Duty Utility Arm interface control document plan

Energy Technology Data Exchange (ETDEWEB)

Engstrom, J.W.

1994-12-27

This document describes the interface control documents that will be used to identify and control interface features throughout all phases of the Light Duty Utility Arm (LDUA) development and design. After the system is built, delivered and installed in the Cold Test Facility and later at the tank farm, the Interface Control Documents can be used in maintaining the configuration control process. The Interface Control Document will consist of Interface Control Drawings and a data base directly tied to the Interface Control Drawings. The data base can be used as an index to conveniently find interface information. Design drawings and other text documents that contain interface information will appear in the database. The Interface Control Drawings will be used to document and control the data and information that define the interface boundaries between systems, subsystems and equipment. Also, the interface boundaries will define the areas of responsibility for systems and subsystems. The drawing will delineate and identify all the physical and functional interfaces that required coordination to establish and maintain compatibility between the co-functioning equipment, computer software, and the tank farm facilities. An appendix contains the Engineering interface control database system riser manual.
Light Duty Utility Arm interface control document plan

International Nuclear Information System (INIS)

Engstrom, J.W.

1994-01-01

This document describes the interface control documents that will be used to identify and control interface features throughout all phases of the Light Duty Utility Arm (LDUA) development and design. After the system is built, delivered and installed in the Cold Test Facility and later at the tank farm, the Interface Control Documents can be used in maintaining the configuration control process. The Interface Control Document will consist of Interface Control Drawings and a data base directly tied to the Interface Control Drawings. The data base can be used as an index to conveniently find interface information. Design drawings and other text documents that contain interface information will appear in the database. The Interface Control Drawings will be used to document and control the data and information that define the interface boundaries between systems, subsystems and equipment. Also, the interface boundaries will define the areas of responsibility for systems and subsystems. The drawing will delineate and identify all the physical and functional interfaces that required coordination to establish and maintain compatibility between the co-functioning equipment, computer software, and the tank farm facilities. An appendix contains the Engineering interface control database system riser manual
Case study on traceable, transparent documentation to support decision-making on nuclear waste disposal

International Nuclear Information System (INIS)

McNeish, J.A.; Andrews, R.W.; Sevougian, S.D.; Dockery, H.A.; Wilson, M.L.; Gauthier, J.H.; Barnard, R.W.; Gaither, K.N.

1999-01-01

The recent assessment of the Yucca Mountain potential repository attempted to develop transparent and traceable documentation. The assessment was largely successful in this effort, providing meaningful graphics and easy to understand descriptions of the analyses for multiple audiences. While there are obviously many areas in the modeling and data collection that could be improved with sufficient resources, the document has been well received as an accurate, understandable assessment of the analyses. A few difficulties were encountered in the efforts to produce transparent and traceable documentation of the performance assessment analyses. Streamlining the text from a typical technical document to more of a layman's document, was not always easy. The data transfer steps that are proceduralized were not always smooth, as we worked out some of the bugs in the data transfer system. For some of the graphics, there was a mismatch between the analyst's hardware/software and the production hardware/software, causing difficulties in printing the graphics. One thing that was clear as the many organizations worked to pull the document together, is that relationships between people are still necessary in spite of all the technology brought to bear on the problem. A high level of cooperation and integration is necessary for the process to work smoothly. Significant effort is being made to continue to improve the processes that lead to traceability. Multiple teams are taking the sequencing of models/data apart and finding all of the transfers required as the project moves toward Site Recommendation and potential licensing. Likewise, the effort to achieve transparency is evolving and will improve with the next iteration of the analyses
Optimization of the Document Placement in the RFID Cabinet

Directory of Open Access Journals (Sweden)

Kiedrowicz Maciej

2016-01-01

Full Text Available The study is devoted to the issue of optimization of the document placement in a single RFID cabinet. It has been assumed that the optimization problem means the reduction of archivization time with respect to the information on all documents with RFID tags. Since the explicit form of the criterion function remains unknown, for the purpose of its approximation, the regression analysis method has been used. The method uses data from a computer simulation of the process of archiving data about documents. To solve the optimization problem, the modified gradient projection method has been used.
Documentation of TRU biological transport model (BIOTRAN)

Energy Technology Data Exchange (ETDEWEB)

Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

1980-01-01

Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text.
Documentation of TRU biological transport model (BIOTRAN)

International Nuclear Information System (INIS)

Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

1980-01-01

Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text
INTERFERENCE IN THE SHORT TEXT OF BESAKIH TEMPLE

Directory of Open Access Journals (Sweden)

Ni Made Kajeng Martha Puspita

2016-05-01

Full Text Available The aim of this study is to analyze the four types of interferences; syntax, semantics, copula, and redundant found in “Besakih Temple” short text. The data were collected through library research with the necessary note-taking and documentation. The method used in analyzing this study is qualitative method. The result showed that interferences found in the text are covering linguistic aspects. It is furthermore called the negative transfer due to the result of contact with another language. The most common source of errors is lack of knowledge of the speaker about the language being used.
ARCHITECTURE SOFTWARE SOLUTION TO SUPPORT AND DOCUMENT MANAGEMENT QUALITY SYSTEM

Directory of Open Access Journals (Sweden)

Milan Eric

2010-12-01

Full Text Available One of the basis of a series of standards JUS ISO 9000 is quality system documentation. An architecture of the quality system documentation depends on the complexity of business system. An establishment of an efficient management documentation of system of quality is of a great importance for the business system, as well as in the phase of introducing the quality system and in further stages of its improvement. The study describes the architecture and capability of software solutions to support and manage the quality system documentation in accordance with the requirements of standards ISO 9001:2001, ISO 14001:2005 HACCP etc.
USING REMOTELY SENSED DATA FOR DOCUMENTATION OF ARCHAEOLOGICAL SITES IN NORTHEASTERN MESOPOTAMIA

Directory of Open Access Journals (Sweden)

E. Matoušková

2016-06-01

Full Text Available This paper introduces two archaeological sites documented during the MULINEM (The Medieval Urban Landscape in Northeastern Mesopotamia project. This project investigates the Late Sasanian and Islamic urban network in the land of Erbil, a historic province of Hidyab (Adiabene that is located in northern Iraq. The investigated sites are the two deserted cities of Makhmúr al-Quadíma and Al-Hadítha. It is assumed that these two sites used to form large cities with high business and cultural importance in the medieval period. The archaeological locations are endangered by various threats.The Al-Hadítha site seems to be under the control of the „Islamic state“ at the moment and Makhmúr al-Quadíma is located just next to the town of new Makhmúr that expands rapidly and without complex urban plans. Documentation of the archaeological sites has been done by using remotely sensed methods together with in-situ measurements (where available. FORMOSAT-2 data that has been gained through a research announcement: Free FORMOSAT-2 satellite imagery and when combined with other sources (recent and historical data it provides a powerful documentation tool. In-situ RPAS measurements and a DTM creation furnish a new source of highly valuable information. Influence of the political and security situation in Al-Hadítha will be analysed.
Using Literary Texts to Teach Grammar in Foreign Language Classroom

Science.gov (United States)

Atmaca, Hasan; Günday, Rifat

2016-01-01

Today, it is discussed that the use of literary texts in foreign language classroom as a course material isn't obligatory; but necessary due to the close relationship between language and literature. Although literary texts are accepted as authentic documents and do not have any purpose for language teaching, they are indispensable sources to be…
Search for dark matter at [Formula: see text] in final states containing an energetic photon and large missing transverse momentum with the ATLAS detector.

Science.gov (United States)

Aaboud, M; Aad, G; Abbott, B; Abdallah, J; Abdinov, O; Abeloos, B; Abidi, S H; AbouZeid, O S; Abraham, N L; Abramowicz, H; Abreu, H; Abreu, R; Abulaiti, Y; Acharya, B S; Adachi, S; Adamczyk, L; Adelman, J; Adersberger, M; Adye, T; Affolder, A A; Agatonovic-Jovin, T; Agheorghiesei, C; Aguilar-Saavedra, J A; Ahlen, S P; Ahmadov, F; Aielli, G; Akatsuka, S; Akerstedt, H; Åkesson, T P A; Akimov, A V; Alberghi, G L; Albert, J; Albicocco, P; Alconada Verzini, M J; Aleksa, M; Aleksandrov, I N; Alexa, C; Alexander, G; Alexopoulos, T; Alhroob, M; Ali, B; Aliev, M; Alimonti, G; Alison, J; Alkire, S P; Allbrooke, B M M; Allen, B W; Allport, P P; Aloisio, A; Alonso, A; Alonso, F; Alpigiani, C; Alshehri, A A; Alstaty, M; Alvarez Gonzalez, B; Álvarez Piqueras, D; Alviggi, M G; Amadio, B T; Amaral Coutinho, Y; Amelung, C; Amidei, D; Amor Dos Santos, S P; Amorim, A; Amoroso, S; Amundsen, G; Anastopoulos, C; Ancu, L S; Andari, N; Andeen, T; Anders, C F; Anders, J K; Anderson, K J; Andreazza, A; Andrei, V; Angelidakis, S; Angelozzi, I; Angerami, A; Anisenkov, A V; Anjos, N; Annovi, A; Antel, C; Antonelli, M; Antonov, A; Antrim, D J; Anulli, F; Aoki, M; Aperio Bella, L; Arabidze, G; Arai, Y; Araque, J P; Araujo Ferraz, V; Arce, A T H; Ardell, R E; Arduh, F A; Arguin, J-F; Argyropoulos, S; Arik, M; Armbruster, A J; Armitage, L J; Arnaez, O; Arnold, H; Arratia, M; Arslan, O; Artamonov, A; Artoni, G; Artz, S; Asai, S; Asbah, N; Ashkenazi, A; Asquith, L; Assamagan, K; Astalos, R; Atkinson, M; Atlay, N B; Aubry, L; Augsten, K; Avolio, G; Axen, B; Ayoub, M K; Azuelos, G; Baas, A E; Baca, M J; Bachacou, H; Bachas, K; Backes, M; Backhaus, M; Bagnaia, P; Bahrasemani, H; Baines, J T; Bajic, M; Baker, O K; Baldin, E M; Balek, P; Balli, F; Balunas, W K; Banas, E; Banerjee, Sw; Bannoura, A A E; Barak, L; Barberio, E L; Barberis, D; Barbero, M; Barillari, T; Barisits, M-S; Barklow, T; Barlow, N; Barnes, S L; Barnett, B M; Barnett, R M; Barnovska-Blenessy, Z; Baroncelli, A; Barone, G; Barr, A J; Barranco Navarro, L; Barreiro, F; Barreiro Guimarães da Costa, J; Bartoldus, R; Barton, A E; Bartos, P; Basalaev, A; Bassalat, A; Bates, R L; Batista, S J; Batley, J R; Battaglia, M; Bauce, M; Bauer, F; Bawa, H S; Beacham, J B; Beattie, M D; Beau, T; Beauchemin, P H; Bechtle, P; Beck, H P; Becker, K; Becker, M; Beckingham, M; Becot, C; Beddall, A J; Beddall, A; Bednyakov, V A; Bedognetti, M; Bee, C P; Beermann, T A; Begalli, M; Begel, M; Behr, J K; Bell, A S; Bella, G; Bellagamba, L; Bellerive, A; Bellomo, M; Belotskiy, K; Beltramello, O; Belyaev, N L; Benary, O; Benchekroun, D; Bender, M; Bendtz, K; Benekos, N; Benhammou, Y; Benhar Noccioli, E; Benitez, J; Benjamin, D P; Benoit, M; Bensinger, J R; Bentvelsen, S; Beresford, L; Beretta, M; Berge, D; Bergeaas Kuutmann, E; Berger, N; Beringer, J; Berlendis, S; Bernard, N R; Bernardi, G; Bernius, C; Bernlochner, F U; Berry, T; Berta, P; Bertella, C; Bertoli, G; Bertolucci, F; Bertram, I A; Bertsche, C; Bertsche, D; Besjes, G J; Bessidskaia Bylund, O; Bessner, M; Besson, N; Betancourt, C; Bethani, A; Bethke, S; Bevan, A J; Beyer, J; Bianchi, R M; Biebel, O; Biedermann, D; Bielski, R; Biesuz, N V; Biglietti, M; Bilbao De Mendizabal, J; Billoud, T R V; Bilokon, H; Bindi, M; Bingul, A; Bini, C; Biondi, S; Bisanz, T; Bittrich, C; Bjergaard, D M; Black, C W; Black, J E; Black, K M; Blair, R E; Blazek, T; Bloch, I; Blocker, C; Blue, A; Blum, W; Blumenschein, U; Blunier, S; Bobbink, G J; Bobrovnikov, V S; Bocchetta, S S; Bocci, A; Bock, C; Boehler, M; Boerner, D; Bogavac, D; Bogdanchikov, A G; Bohm, C; Boisvert, V; Bokan, P; Bold, T; Boldyrev, A S; Bolz, A E; Bomben, M; Bona, M; Boonekamp, M; Borisov, A; Borissov, G; Bortfeldt, J; Bortoletto, D; Bortolotto, V; Boscherini, D; Bosman, M; Bossio Sola, J D; Boudreau, J; Bouffard, J; Bouhova-Thacker, E V; Boumediene, D; Bourdarios, C; Boutle, S K; Boveia, A; Boyd, J; Boyko, I R; Bracinik, J; Brandt, A; Brandt, G; Brandt, O; Bratzler, U; Brau, B; Brau, J E; Breaden Madden, W D; Brendlinger, K; Brennan, A J; Brenner, L; Brenner, R; Bressler, S; Briglin, D L; Bristow, T M; Britton, D; Britzger, D; Brochu, F M; Brock, I; Brock, R; Brooijmans, G; Brooks, T; Brooks, W K; Brosamer, J; Brost, E; Broughton, J H; Bruckman de Renstrom, P A; Bruncko, D; Bruni, A; Bruni, G; Bruni, L S; Brunt, B H; Bruschi, M; Bruscino, N; Bryant, P; Bryngemark, L; Buanes, T; Buat, Q; Buchholz, P; Buckley, A G; Budagov, I A; Buehrer, F; Bugge, M K; Bulekov, O; Bullock, D; Burch, T J; Burckhart, H; Burdin, S; Burgard, C D; Burger, A M; Burghgrave, B; Burka, K; Burke, S; Burmeister, I; Burr, J T P; Busato, E; Büscher, D; Büscher, V; Bussey, P; Butler, J M; Buttar, C M; Butterworth, J M; Butti, P; Buttinger, W; Buzatu, A; Buzykaev, A R; Cabrera Urbán, S; Caforio, D; Cairo, V M; Cakir, O; Calace, N; Calafiura, P; Calandri, A; Calderini, G; Calfayan, P; Callea, G; Caloba, L P; Calvente Lopez, S; Calvet, D; Calvet, S; Calvet, T P; Camacho Toro, R; Camarda, S; Camarri, P; Cameron, D; Caminal Armadans, R; Camincher, C; Campana, S; Campanelli, M; Camplani, A; Campoverde, A; Canale, V; Cano Bret, M; Cantero, J; Cao, T; Capeans Garrido, M D M; Caprini, I; Caprini, M; Capua, M; Carbone, R M; Cardarelli, R; Cardillo, F; Carli, I; Carli, T; Carlino, G; Carlson, B T; Carminati, L; Carney, R M D; Caron, S; Carquin, E; Carrá, S; Carrillo-Montoya, G D; Carvalho, J; Casadei, D; Casado, M P; Casolino, M; Casper, D W; Castelijn, R; Castillo Gimenez, V; Castro, N F; Catinaccio, A; Catmore, J R; Cattai, A; Caudron, J; Cavaliere, V; Cavallaro, E; Cavalli, D; Cavalli-Sforza, M; Cavasinni, V; Celebi, E; Ceradini, F; Cerda Alberich, L; Cerqueira, A S; Cerri, A; Cerrito, L; Cerutti, F; Cervelli, A; Cetin, S A; Chafaq, A; Chakraborty, D; Chan, S K; Chan, W S; Chan, Y L; Chang, P; Chapman, J D; Charlton, D G; Chau, C C; Chavez Barajas, C A; Che, S; Cheatham, S; Chegwidden, A; Chekanov, S; Chekulaev, S V; Chelkov, G A; Chelstowska, M A; Chen, C; Chen, H; Chen, S; Chen, S; Chen, X; Chen, Y; Cheng, H C; Cheng, H J; Cheplakov, A; Cheremushkina, E; Cherkaoui El Moursli, R; Chernyatin, V; Cheu, E; Chevalier, L; Chiarella, V; Chiarelli, G; Chiodini, G; Chisholm, A S; Chitan, A; Chiu, Y H; Chizhov, M V; Choi, K; Chomont, A R; Chouridou, S; Christodoulou, V; Chromek-Burckhart, D; Chu, M C; Chudoba, J; Chuinard, A J; Chwastowski, J J; Chytka, L; Ciftci, A K; Cinca, D; Cindro, V; Cioara, I A; Ciocca, C; Ciocio, A; Cirotto, F; Citron, Z H; Citterio, M; Ciubancan, M; Clark, A; Clark, B L; Clark, M R; Clark, P J; Clarke, R N; Clement, C; Coadou, Y; Cobal, M; Coccaro, A; Cochran, J; Colasurdo, L; Cole, B; Colijn, A P; Collot, J; Colombo, T; Conde Muiño, P; Coniavitis, E; Connell, S H; Connelly, I A; Constantinescu, S; Conti, G; Conventi, F; Cooke, M; Cooper-Sarkar, A M; Cormier, F; Cormier, K J R; Corradi, M; Corriveau, F; Cortes-Gonzalez, A; Cortiana, G; Costa, G; Costa, M J; Costanzo, D; Cottin, G; Cowan, G; Cox, B E; Cranmer, K; Crawley, S J; Creager, R A; Cree, G; Crépé-Renaudin, S; Crescioli, F; Cribbs, W A; Cristinziani, M; Croft, V; Crosetti, G; Cueto, A; Cuhadar Donszelmann, T; Cukierman, A R; Cummings, J; Curatolo, M; Cúth, J; Czirr, H; Czodrowski, P; D'amen, G; D'Auria, S; D'eramo, L; D'Onofrio, M; Da Cunha Sargedas De Sousa, M J; Da Via, C; Dabrowski, W; Dado, T; Dai, T; Dale, O; Dallaire, F; Dallapiccola, C; Dam, M; Dandoy, J R; Daneri, M F; Dang, N P; Daniells, A C; Dann, N S; Danninger, M; Dano Hoffmann, M; Dao, V; Darbo, G; Darmora, S; Dassoulas, J; Dattagupta, A; Daubney, T; Davey, W; David, C; Davidek, T; Davies, M; Davis, D R; Davison, P; Dawe, E; Dawson, I; De, K; de Asmundis, R; De Benedetti, A; De Castro, S; De Cecco, S; De Groot, N; de Jong, P; De la Torre, H; De Lorenzi, F; De Maria, A; De Pedis, D; De Salvo, A; De Sanctis, U; De Santo, A; De Vasconcelos Corga, K; De Vivie De Regie, J B; Dearnaley, W J; Debbe, R; Debenedetti, C; Dedovich, D V; Dehghanian, N; Deigaard, I; Del Gaudio, M; Del Peso, J; Del Prete, T; Delgove, D; Deliot, F; Delitzsch, C M; Dell'Acqua, A; Dell'Asta, L; Dell'Orso, M; Della Pietra, M; Della Volpe, D; Delmastro, M; Delporte, C; Delsart, P A; DeMarco, D A; Demers, S; Demichev, M; Demilly, A; Denisov, S P; Denysiuk, D; Derendarz, D; Derkaoui, J E; Derue, F; Dervan, P; Desch, K; Deterre, C; Dette, K; Devesa, M R; Deviveiros, P O; Dewhurst, A; Dhaliwal, S; Di Bello, F A; Di Ciaccio, A; Di Ciaccio, L; Di Clemente, W K; Di Donato, C; Di Girolamo, A; Di Girolamo, B; Di Micco, B; Di Nardo, R; Di Petrillo, K F; Di Simone, A; Di Sipio, R; Di Valentino, D; Diaconu, C; Diamond, M; Dias, F A; Diaz, M A; Diehl, E B; Dietrich, J; Díez Cornell, S; Dimitrievska, A; Dingfelder, J; Dita, P; Dita, S; Dittus, F; Djama, F; Djobava, T; Djuvsland, J I; do Vale, M A B; Dobos, D; Dobre, M; Doglioni, C; Dolejsi, J; Dolezal, Z; Donadelli, M; Donati, S; Dondero, P; Donini, J; Dopke, J; Doria, A; Dova, M T; Doyle, A T; Drechsler, E; Dris, M; Du, Y; Duarte-Campderros, J; Dubreuil, A; Duchovni, E; Duckeck, G; Ducourthial, A; Ducu, O A; Duda, D; Dudarev, A; Dudder, A Chr; Duffield, E M; Duflot, L; Dührssen, M; Dumancic, M; Dumitriu, A E; Duncan, A K; Dunford, M; Duran Yildiz, H; Düren, M; Durglishvili, A; Duschinger, D; Dutta, B; Dyndal, M; Eckardt, C; Ecker, K M; Edgar, R C; Eifert, T; Eigen, G; Einsweiler, K; Ekelof, T; El Kacimi, M; El Kosseifi, R; Ellajosyula, V; Ellert, M; Elles, S; Ellinghaus, F; Elliot, A A; Ellis, N; Elmsheuser, J; Elsing, M; Emeliyanov, D; Enari, Y; Endner, O C; Ennis, J S; Erdmann, J; Ereditato, A; Ernis, G; Ernst, M; Errede, S; Escalier, M; Escobar, C; Esposito, B; Estrada Pastor, O; Etienvre, A I; Etzion, E; Evans, H; Ezhilov, A; Ezzi, M; Fabbri, F; Fabbri, L; Facini, G; Fakhrutdinov, R M; Falciano, S; Falla, R J; Faltova, J; Fang, Y; Fanti, M; Farbin, A; Farilla, A; Farina, C; Farina, E M; Farooque, T; Farrell, S; Farrington, S M; Farthouat, P; Fassi, F; Fassnacht, P; Fassouliotis, D; Faucci Giannelli, M; Favareto, A; Fawcett, W J; Fayard, L; Fedin, O L; Fedorko, W; Feigl, S; Feligioni, L; Feng, C; Feng, E J; Feng, H; Fenton, M J; Fenyuk, A B; Feremenga, L; Fernandez Martinez, P; Fernandez Perez, S; Ferrando, J; Ferrari, A; Ferrari, P; Ferrari, R; Ferreira de Lima, D E; Ferrer, A; Ferrere, D; Ferretti, C; Fiedler, F; Filipčič, A; Filipuzzi, M; Filthaut, F; Fincke-Keeler, M; Finelli, K D; Fiolhais, M C N; Fiorini, L; Fischer, A; Fischer, C; Fischer, J; Fisher, W C; Flaschel, N; Fleck, I; Fleischmann, P; Fletcher, R R M; Flick, T; Flierl, B M; Flores Castillo, L R; Flowerdew, M J; Forcolin, G T; Formica, A; Förster, F A; Forti, A; Foster, A G; Fournier, D; Fox, H; Fracchia, S; Francavilla, P; Franchini, M; Franchino, S; Francis, D; Franconi, L; Franklin, M; Frate, M; Fraternali, M; Freeborn, D; Fressard-Batraneanu, S M; Freund, B; Froidevaux, D; Frost, J A; Fukunaga, C; Fusayasu, T; Fuster, J; Gabaldon, C; Gabizon, O; Gabrielli, A; Gabrielli, A; Gach, G P; Gadatsch, S; Gadomski, S; Gagliardi, G; Gagnon, L G; Galea, C; Galhardo, B; Gallas, E J; Gallop, B J; Gallus, P; Galster, G; Gan, K K; Ganguly, S; Gao, Y; Gao, Y S; Garay Walls, F M; García, C; García Navarro, J E; Garcia-Sciveres, M; Gardner, R W; Garelli, N; Garonne, V; Gascon Bravo, A; Gasnikova, K; Gatti, C; Gaudiello, A; Gaudio, G; Gavrilenko, I L; Gay, C; Gaycken, G; Gazis, E N; Gee, C N P; Geisen, J; Geisen, M; Geisler, M P; Gellerstedt, K; Gemme, C; Genest, M H; Geng, C; Gentile, S; Gentsos, C; George, S; Gerbaudo, D; Gershon, A; Geßner, G; Ghasemi, S; Ghneimat, M; Giacobbe, B; Giagu, S; Giannetti, P; Gibson, S M; Gignac, M; Gilchriese, M; Gillberg, D; Gilles, G; Gingrich, D M; Giokaris, N; Giordani, M P; Giorgi, F M; Giraud, P F; Giromini, P; Giugni, D; Giuli, F; Giuliani, C; Giulini, M; Gjelsten, B K; Gkaitatzis, S; Gkialas, I; Gkougkousis, E L; Gkountoumis, P; Gladilin, L K; Glasman, C; Glatzer, J; Glaysher, P C F; Glazov, A; Goblirsch-Kolb, M; Godlewski, J; Goldfarb, S; Golling, T; Golubkov, D; Gomes, A; Gonçalo, R; Goncalves Gama, R; Goncalves Pinto Firmino Da Costa, J; Gonella, G; Gonella, L; Gongadze, A; González de la Hoz, S; Gonzalez-Sevilla, S; Goossens, L; Gorbounov, P A; Gordon, H A; Gorelov, I; Gorini, B; Gorini, E; Gorišek, A; Goshaw, A T; Gössling, C; Gostkin, M I; Gottardo, C A; Goudet, C R; Goujdami, D; Goussiou, A G; Govender, N; Gozani, E; Graber, L; Grabowska-Bold, I; Gradin, P O J; Gramling, J; Gramstad, E; Grancagnolo, S; Gratchev, V; Gravila, P M; Gray, C; Gray, H M; Greenwood, Z D; Grefe, C; Gregersen, K; Gregor, I M; Grenier, P; Grevtsov, K; Griffiths, J; Grillo, A A; Grimm, K; Grinstein, S; Gris, Ph; Grivaz, J-F; Groh, S; Gross, E; Grosse-Knetter, J; Grossi, G C; Grout, Z J; Grummer, A; Guan, L; Guan, W; Guenther, J; Guescini, F; Guest, D; Gueta, O; Gui, B; Guido, E; Guillemin, T; Guindon, S; Gul, U; Gumpert, C; Guo, J; Guo, W; Guo, Y; Gupta, R; Gupta, S; Gustavino, G; Gutierrez, P; Gutierrez Ortiz, N G; Gutschow, C; Guyot, C; Guzik, M P; Gwenlan, C; Gwilliam, C B; Haas, A; Haber, C; Hadavand, H K; Haddad, N; Hadef, A; Hageböck, S; Hagihara, M; Hakobyan, H; Haleem, M; Haley, J; Halladjian, G; Hallewell, G D; Hamacher, K; Hamal, P; Hamano, K; Hamilton, A; Hamity, G N; Hamnett, P G; Han, L; Han, S; Hanagaki, K; Hanawa, K; Hance, M; Haney, B; Hanke, P; Hansen, J B; Hansen, J D; Hansen, M C; Hansen, P H; Hara, K; Hard, A S; Harenberg, T; Hariri, F; Harkusha, S; Harrington, R D; Harrison, P F; Hartmann, N M; Hasegawa, M; Hasegawa, Y; Hasib, A; Hassani, S; Haug, S; Hauser, R; Hauswald, L; Havener, L B; Havranek, M; Hawkes, C M; Hawkings, R J; Hayakawa, D; Hayden, D; Hays, C P; Hays, J M; Hayward, H S; Haywood, S J; Head, S J; Heck, T; Hedberg, V; Heelan, L; Heidegger, K K; Heim, S; Heim, T; Heinemann, B; Heinrich, J J; Heinrich, L; Heinz, C; Hejbal, J; Helary, L; Held, A; Hellman, S; Helsens, C; Henderson, R C W; Heng, Y; Henkelmann, S; Henriques Correia, A M; Henrot-Versille, S; Herbert, G H; Herde, H; Herget, V; Hernández Jiménez, Y; Herten, G; Hertenberger, R; Hervas, L; Herwig, T C; Hesketh, G G; Hessey, N P; Hetherly, J W; Higashino, S; Higón-Rodriguez, E; Hill, E; Hill, J C; Hiller, K H; Hillier, S J; Hils, M; Hinchliffe, I; Hirose, M; Hirschbuehl, D; Hiti, B; Hladik, O; Hoad, X; Hobbs, J; Hod, N; Hodgkinson, M C; Hodgson, P; Hoecker, A; Hoeferkamp, M R; Hoenig, F; Hohn, D; Holmes, T R; Homann, M; Honda, S; Honda, T; Hong, T M; Hooberman, B H; Hopkins, W H; Horii, Y; Horton, A J; Hostachy, J-Y; Hou, S; Hoummada, A; Howarth, J; Hoya, J; Hrabovsky, M; Hrdinka, J; Hristova, I; Hrivnac, J; Hryn'ova, T; Hrynevich, A; Hsu, P J; Hsu, S-C; Hu, Q; Hu, S; Huang, Y; Hubacek, Z; Hubaut, F; Huegging, F; Huffman, T B; Hughes, E W; Hughes, G; Huhtinen, M; Huo, P; Huseynov, N; Huston, J; Huth, J; Iacobucci, G; Iakovidis, G; Ibragimov, I; Iconomidou-Fayard, L; Idrissi, Z; Iengo, P; Igonkina, O; Iizawa, T; Ikegami, Y; Ikeno, M; Ilchenko, Y; Iliadis, D; Ilic, N; Introzzi, G; Ioannou, P; Iodice, M; Iordanidou, K; Ippolito, V; Isacson, M F; Ishijima, N; Ishino, M; Ishitsuka, M; Issever, C; Istin, S; Ito, F; Iturbe Ponce, J M; Iuppa, R; Iwasaki, H; Izen, J M; Izzo, V; Jabbar, S; Jackson, P; Jacobs, R M; Jain, V; Jakobi, K B; Jakobs, K; Jakobsen, S; Jakoubek, T; Jamin, D O; Jana, D K; Jansky, R; Janssen, J; Janus, M; Janus, P A; Jarlskog, G; Javadov, N; Javůrek, T; Javurkova, M; Jeanneau, F; Jeanty, L; Jejelava, J; Jelinskas, A; Jenni, P; Jeske, C; Jézéquel, S; Ji, H; Jia, J; Jiang, H; Jiang, Y; Jiang, Z; Jiggins, S; Jimenez Pena, J; Jin, S; Jinaru, A; Jinnouchi, O; Jivan, H; Johansson, P; Johns, K A; Johnson, C A; Johnson, W J; Jon-And, K; Jones, R W L; Jones, S D; Jones, S; Jones, T J; Jongmanns, J; Jorge, P M; Jovicevic, J; Ju, X; Juste Rozas, A; Köhler, M K; Kaczmarska, A; Kado, M; Kagan, H; Kagan, M; Kahn, S J; Kaji, T; Kajomovitz, E; Kalderon, C W; Kaluza, A; Kama, S; Kamenshchikov, A; Kanaya, N; Kanjir, L; Kantserov, V A; Kanzaki, J; Kaplan, B; Kaplan, L S; Kar, D; Karakostas, K; Karastathis, N; Kareem, M J; Karentzos, E; Karpov, S N; Karpova, Z M; Karthik, K; Kartvelishvili, V; Karyukhin, A N; Kasahara, K; Kashif, L; Kass, R D; Kastanas, A; Kataoka, Y; Kato, C; Katre, A; Katzy, J; Kawade, K; Kawagoe, K; Kawamoto, T; Kawamura, G; Kay, E F; Kazanin, V F; Keeler, R; Kehoe, R; Keller, J S; Kempster, J J; Kendrick, J; Keoshkerian, H; Kepka, O; Kerševan, B P; Kersten, S; Keyes, R A; Khader, M; Khalil-Zada, F; Khanov, A; Kharlamov, A G; Kharlamova, T; Khodinov, A; Khoo, T J; Khovanskiy, V; Khramov, E; Khubua, J; Kido, S; Kilby, C R; Kim, H Y; Kim, S H; Kim, Y K; Kimura, N; Kind, O M; King, B T; Kirchmeier, D; Kirk, J; Kiryunin, A E; Kishimoto, T; Kisielewska, D; Kiuchi, K; Kivernyk, O; Kladiva, E; Klapdor-Kleingrothaus, T; Klein, M H; Klein, M; Klein, U; Kleinknecht, K; Klimek, P; Klimentov, A; Klingenberg, R; Klingl, T; Klioutchnikova, T; Kluge, E-E; Kluit, P; Kluth, S; Kneringer, E; Knoops, E B F G; Knue, A; Kobayashi, A; Kobayashi, D; Kobayashi, T; Kobel, M; Kocian, M; Kodys, P; Koffas, T; Koffeman, E; Köhler, N M; Koi, T; Kolb, M; Koletsou, I; Komar, A A; Komori, Y; Kondo, T; Kondrashova, N; Köneke, K; König, A C; Kono, T; Konoplich, R; Konstantinidis, N; Kopeliansky, R; Koperny, S; Kopp, A K; Korcyl, K; Kordas, K; Korn, A; Korol, A A; Korolkov, I; Korolkova, E V; Kortner, O; Kortner, S; Kosek, T; Kostyukhin, V V; Kotwal, A; Koulouris, A; Kourkoumeli-Charalampidi, A; Kourkoumelis, C; Kourlitis, E; Kouskoura, V; Kowalewska, A B; Kowalewski, R; Kowalski, T Z; Kozakai, C; Kozanecki, W; Kozhin, A S; Kramarenko, V A; Kramberger, G; Krasnopevtsev, D; Krasny, M W; Krasznahorkay, A; Krauss, D; Kremer, J A; Kretzschmar, J; Kreutzfeldt, K; Krieger, P; Krizka, K; Kroeninger, K; Kroha, H; Kroll, J; Kroll, J; Kroseberg, J; Krstic, J; Kruchonak, U; Krüger, H; Krumnack, N; Kruse, M C; Kubota, T; Kucuk, H; Kuday, S; Kuechler, J T; Kuehn, S; Kugel, A; Kuger, F; Kuhl, T; Kukhtin, V; Kukla, R; Kulchitsky, Y; Kuleshov, S; Kulinich, Y P; Kuna, M; Kunigo, T; Kupco, A; Kupfer, T; Kuprash, O; Kurashige, H; Kurchaninov, L L; Kurochkin, Y A; Kurth, M G; Kus, V; Kuwertz, E S; Kuze, M; Kvita, J; Kwan, T; Kyriazopoulos, D; La Rosa, A; Navarro, J L La Rosa; La Rotonda, L; Lacasta, C; Lacava, F; Lacey, J; Lacker, H; Lacour, D; Ladygin, E; Lafaye, R; Laforge, B; Lagouri, T; Lai, S; Lammers, S; Lampl, W; Lançon, E; Landgraf, U; Landon, M P J; Lanfermann, M C; Lang, V S; Lange, J C; Langenberg, R J; Lankford, A J; Lanni, F; Lantzsch, K; Lanza, A; Lapertosa, A; Laplace, S; Laporte, J F; Lari, T; Lasagni Manghi, F; Lassnig, M; Laurelli, P; Lavrijsen, W; Law, A T; Laycock, P; Lazovich, T; Lazzaroni, M; Le, B; Le Dortz, O; Le Guirriec, E; Le Quilleuc, E P; LeBlanc, M; LeCompte, T; Ledroit-Guillon, F; Lee, C A; Lee, G R; Lee, S C; Lee, L; Lefebvre, B; Lefebvre, G; Lefebvre, M; Legger, F; Leggett, C; Lehan, A; Lehmann Miotto, G; Lei, X; Leight, W A; Leite, M A L; Leitner, R; Lellouch, D; Lemmer, B; Leney, K J C; Lenz, T; Lenzi, B; Leone, R; Leone, S; Leonidopoulos, C; Lerner, G; Leroy, C; Lesage, A A J; Lester, C G; Levchenko, M; Levêque, J; Levin, D; Levinson, L J; Levy, M; Lewis, D; Li, B; Li, C; Li, H; Li, L; Li, Q; Li, S; Li, X; Li, Y; Liang, Z; Liberti, B; Liblong, A; Lie, K; Liebal, J; Liebig, W; Limosani, A; Lin, S C; Lin, T H; Lindquist, B E; Lionti, A E; Lipeles, E; Lipniacka, A; Lisovyi, M; Liss, T M; Lister, A; Litke, A M; Liu, B; Liu, H; Liu, H; Liu, J K K; Liu, J; Liu, J B; Liu, K; Liu, L; Liu, M; Liu, Y L; Liu, Y; Livan, M; Lleres, A; Llorente Merino, J; Lloyd, S L; Lo, C Y; Sterzo, F Lo; Lobodzinska, E M; Loch, P; Loebinger, F K; Loesle, A; Loew, K M; Loginov, A; Lohse, T; Lohwasser, K; Lokajicek, M; Long, B A; Long, J D; Long, R E; Longo, L; Looper, K A; Lopez, J A; Lopez Mateos, D; Lopez Paz, I; Lopez Solis, A; Lorenz, J; Lorenzo Martinez, N; Losada, M; Lösel, P J; Lou, X; Lounis, A; Love, J; Love, P A; Lu, H; Lu, N; Lu, Y J; Lubatti, H J; Luci, C; Lucotte, A; Luedtke, C; Luehring, F; Lukas, W; Luminari, L; Lundberg, O; Lund-Jensen, B; Luzi, P M; Lynn, D; Lysak, R; Lytken, E; Lyubushkin, V; Ma, H; Ma, L L; Ma, Y; Maccarrone, G; Macchiolo, A; Macdonald, C M; Maček, B; Machado Miguens, J; Madaffari, D; Madar, R; Mader, W F; Madsen, A; Maeda, J; Maeland, S; Maeno, T; Maevskiy, A S; Magradze, E; Mahlstedt, J; Maiani, C; Maidantchik, C; Maier, A A; Maier, T; Maio, A; Majersky, O; Majewski, S; Makida, Y; Makovec, N; Malaescu, B; Malecki, Pa; Maleev, V P; Malek, F; Mallik, U; Malon, D; Malone, C; Maltezos, S; Malyukov, S; Mamuzic, J; Mancini, G; Mandelli, L; Mandić, I; Maneira, J; Manhaes de Andrade Filho, L; Manjarres Ramos, J; Mann, A; Manousos, A; Mansoulie, B; Mansour, J D; Mantifel, R; Mantoani, M; Manzoni, S; Mapelli, L; Marceca, G; March, L; Marchese, L; Marchiori, G; Marcisovsky, M; Marjanovic, M; Marley, D E; Marroquim, F; Marsden, S P; Marshall, Z; Martensson, M U F; Marti-Garcia, S; Martin, C B; Martin, T A; Martin, V J; Martin Dit Latour, B; Martinez, M; Martinez Outschoorn, V I; Martin-Haugh, S; Martoiu, V S; Martyniuk, A C; Marzin, A; Masetti, L; Mashimo, T; Mashinistov, R; Masik, J; Maslennikov, A L; Massa, L; Mastrandrea, P; Mastroberardino, A; Masubuchi, T; Mättig, P; Maurer, J; Maxfield, S J; Maximov, D A; Mazini, R; Maznas, I; Mazza, S M; Mc Fadden, N C; Mc Goldrick, G; Mc Kee, S P; McCarn, A; McCarthy, R L; McCarthy, T G; McClymont, L I; McDonald, E F; Mcfayden, J A; Mchedlidze, G; McMahon, S J; McNamara, P C; McPherson, R A; Meehan, S; Megy, T J; Mehlhase, S; Mehta, A; Meideck, T; Meier, K; Meirose, B; Melini, D; Mellado Garcia, B R; Mellenthin, J D; Melo, M; Meloni, F; Menary, S B; Meng, L; Meng, X T; Mengarelli, A; Menke, S; Meoni, E; Mergelmeyer, S; Mermod, P; Merola, L; Meroni, C; Merritt, F S; Messina, A; Metcalfe, J; Mete, A S; Meyer, C; Meyer, J-P; Meyer, J; Meyer Zu Theenhausen, H; Miano, F; Middleton, R P; Miglioranzi, S; Mijović, L; Mikenberg, G; Mikestikova, M; Mikuž, M; Milesi, M; Milic, A; Miller, D W; Mills, C; Milov, A; Milstead, D A; Minaenko, A A; Minami, Y; Minashvili, I A; Mincer, A I; Mindur, B; Mineev, M; Minegishi, Y; Ming, Y; Mir, L M; Mistry, K P; Mitani, T; Mitrevski, J; Mitsou, V A; Miucci, A; Miyagawa, P S; Mizukami, A; Mjörnmark, J U; Mkrtchyan, T; Mlynarikova, M; Moa, T; Mochizuki, K; Mogg, P; Mohapatra, S; Molander, S; Moles-Valls, R; Monden, R; Mondragon, M C; Mönig, K; Monk, J; Monnier, E; Montalbano, A; Montejo Berlingen, J; Monticelli, F; Monzani, S; Moore, R W; Morange, N; Moreno, D; Moreno Llácer, M; Morettini, P; Morgenstern, S; Mori, D; Mori, T; Morii, M; Morinaga, M; Morisbak, V; Morley, A K; Mornacchi, G; Morris, J D; Morvaj, L; Moschovakos, P; Mosidze, M; Moss, H J; Moss, J; Motohashi, K; Mount, R; Mountricha, E; Moyse, E J W; Muanza, S; Mudd, R D; Mueller, F; Mueller, J; Mueller, R S P; Muenstermann, D; Mullen, P; Mullier, G A; Munoz Sanchez, F J; Murray, W J; Musheghyan, H; Muškinja, M; Myagkov, A G; Myska, M; Nachman, B P; Nackenhorst, O; Nagai, K; Nagai, R; Nagano, K; Nagasaka, Y; Nagata, K; Nagel, M; Nagy, E; Nairz, A M; Nakahama, Y; Nakamura, K; Nakamura, T; Nakano, I; Naranjo Garcia, R F; Narayan, R; Narrias Villar, D I; Naryshkin, I; Naumann, T; Navarro, G; Nayyar, R; Neal, H A; Nechaeva, P Yu; Neep, T J; Negri, A; Negrini, M; Nektarijevic, S; Nellist, C; Nelson, A; Nelson, M E; Nemecek, S; Nemethy, P; Nessi, M; Neubauer, M S; Neumann, M; Newman, P R; Ng, T Y; Nguyen Manh, T; Nickerson, R B; Nicolaidou, R; Nielsen, J; Nikolaenko, V; Nikolic-Audit, I; Nikolopoulos, K; Nilsen, J K; Nilsson, P; Ninomiya, Y; Nisati, A; Nishu, N; Nisius, R; Nitsche, I; Nobe, T; Noguchi, Y; Nomachi, M; Nomidis, I; Nomura, M A; Nooney, T; Nordberg, M; Norjoharuddeen, N; Novgorodova, O; Nowak, S; Nozaki, M; Nozka, L; Ntekas, K; Nurse, E; Nuti, F; O'connor, K; O'Neil, D C; O'Rourke, A A; O'Shea, V; Oakham, F G; Oberlack, H; Obermann, T; Ocariz, J; Ochi, A; Ochoa, I; Ochoa-Ricoux, J P; Oda, S; Odaka, S; Ogren, H; Oh, A; Oh, S H; Ohm, C C; Ohman, H; Oide, H; Okawa, H; Okumura, Y; Okuyama, T; Olariu, A; Oleiro Seabra, L F; Olivares Pino, S A; Oliveira Damazio, D; Olszewski, A; Olszowska, J; Onofre, A; Onogi, K; Onyisi, P U E; Oreglia, M J; Oren, Y; Orestano, D; Orlando, N; Orr, R S; Osculati, B; Ospanov, R; Otero Y Garzon, G; Otono, H; Ouchrif, M; Ould-Saada, F; Ouraou, A; Oussoren, K P; Ouyang, Q; Owen, M; Owen, R E; Ozcan, V E; Ozturk, N; Pachal, K; Pacheco Pages, A; Pacheco Rodriguez, L; Padilla Aranda, C; Pagan Griso, S; Paganini, M; Paige, F; Palacino, G; Palazzo, S; Palestini, S; Palka, M; Pallin, D; St Panagiotopoulou, E; Panagoulias, I; Pandini, C E; Panduro Vazquez, J G; Pani, P; Panitkin, S; Pantea, D; Paolozzi, L; Papadopoulou, Th D; Papageorgiou, K; Paramonov, A; Paredes Hernandez, D; Parker, A J; Parker, M A; Parker, K A; Parodi, F; Parsons, J A; Parzefall, U; Pascuzzi, V R; Pasner, J M; Pasqualucci, E; Passaggio, S; Pastore, Fr; Pataraia, S; Pater, J R; Pauly, T; Pearson, B; Pedraza Lopez, S; Pedro, R; Peleganchuk, S V; Penc, O; Peng, C; Peng, H; Penwell, J; Peralva, B S; Perego, M M; Perepelitsa, D V; Perini, L; Pernegger, H; Perrella, S; Peschke, R; Peshekhonov, V D; Peters, K; Peters, R F Y; Petersen, B A; Petersen, T C; Petit, E; Petridis, A; Petridou, C; Petroff, P; Petrolo, E; Petrov, M; Petrucci, F; Pettersson, N E; Peyaud, A; Pezoa, R; Phillips, F H; Phillips, P W; Piacquadio, G; Pianori, E; Picazio, A; Piccaro, E; Pickering, M A; Piegaia, R; Pilcher, J E; Pilkington, A D; Pin, A W J; Pinamonti, M; Pinfold, J L; Pirumov, H; Pitt, M; Plazak, L; Pleier, M-A; Pleskot, V; Plotnikova, E; Pluth, D; Podberezko, P; Poettgen, R; Poggi, R; Poggioli, L; Pohl, D; Polesello, G; Poley, A; Policicchio, A; Polifka, R; Polini, A; Pollard, C S; Polychronakos, V; Pommès, K; Ponomarenko, D; Pontecorvo, L; Pope, B G; Popeneciu, G A; Poppleton, A; Pospisil, S; Potamianos, K; Potrap, I N; Potter, C J; Poulard, G; Poulsen, T; Poveda, J; Pozo Astigarraga, M E; Pralavorio, P; Pranko, A; Prell, S; Price, D; Price, L E; Primavera, M; Prince, S; Proklova, N; Prokofiev, K; Prokoshin, F; Protopopescu, S; Proudfoot, J; Przybycien, M; Puri, A; Puzo, P; Qian, J; Qin, G; Qin, Y; Quadt, A; Queitsch-Maitland, M; Quilty, D; Raddum, S; Radeka, V; Radescu, V; Radhakrishnan, S K; Radloff, P; Rados, P; Ragusa, F; Rahal, G; Raine, J A; Rajagopalan, S; Rangel-Smith, C; Rashid, T; Raspopov, S; Ratti, M G; Rauch, D M; Rauscher, F; Rave, S; Ravinovich, I; Rawling, J H; Raymond, M; Read, A L; Readioff, N P; Reale, M; Rebuzzi, D M; Redelbach, A; Redlinger, G; Reece, R; Reed, R G; Reeves, K; Rehnisch, L; Reichert, J; Reiss, A; Rembser, C; Ren, H; Rescigno, M; Resconi, S; Resseguie, E D; Rettie, S; Reynolds, E; Rezanova, O L; Reznicek, P; Rezvani, R; Richter, R; Richter, S; Richter-Was, E; Ricken, O; Ridel, M; Rieck, P; Riegel, C J; Rieger, J; Rifki, O; Rijssenbeek, M; Rimoldi, A; Rimoldi, M; Rinaldi, L; Ripellino, G; Ristić, B; Ritsch, E; Riu, I; Rizatdinova, F; Rizvi, E; Rizzi, C; Roberts, R T; Robertson, S H; Robichaud-Veronneau, A; Robinson, D; Robinson, J E M; Robson, A; Rocco, E; Roda, C; Rodina, Y; Rodriguez Bosca, S; Rodriguez Perez, A; Rodriguez Rodriguez, D; Roe, S; Rogan, C S; Røhne, O; Roloff, J; Romaniouk, A; Romano, M; Romano Saez, S M; Romero Adam, E; Rompotis, N; Ronzani, M; Roos, L; Rosati, S; Rosbach, K; Rose, P; Rosien, N-A; Rossi, E; Rossi, L P; Rosten, J H N; Rosten, R; Rotaru, M; Roth, I; Rothberg, J; Rousseau, D; Rozanov, A; Rozen, Y; Ruan, X; Rubbo, F; Rühr, F; Ruiz-Martinez, A; Rurikova, Z; Rusakovich, N A; Russell, H L; Rutherfoord, J P; Ruthmann, N; Ryabov, Y F; Rybar, M; Rybkin, G; Ryu, S; Ryzhov, A; Rzehorz, G F; Saavedra, A F; Sabato, G; Sacerdoti, S; Sadrozinski, H F-W; Sadykov, R; Safai Tehrani, F; Saha, P; Sahinsoy, M; Saimpert, M; Saito, M; Saito, T; Sakamoto, H; Sakurai, Y; Salamanna, G; Salazar Loyola, J E; Salek, D; Sales De Bruin, P H; Salihagic, D; Salnikov, A; Salt, J; Salvatore, D; Salvatore, F; Salvucci, A; Salzburger, A; Sammel, D; Sampsonidis, D; Sampsonidou, D; Sánchez, J; Sanchez Martinez, V; Sanchez Pineda, A; Sandaker, H; Sandbach, R L; Sander, C O; Sandhoff, M; Sandoval, C; Sankey, D P C; Sannino, M; Sansoni, A; Santoni, C; Santonico, R; Santos, H; Santoyo Castillo, I; Sapronov, A; Saraiva, J G; Sarrazin, B; Sasaki, O; Sato, K; Sauvan, E; Savage, G; Savard, P; Savic, N; Sawyer, C; Sawyer, L; Saxon, J; Sbarra, C; Sbrizzi, A; Scanlon, T; Scannicchio, D A; Scarcella, M; Scarfone, V; Schaarschmidt, J; Schacht, P; Schachtner, B M; Schaefer, D; Schaefer, L; Schaefer, R; Schaeffer, J; Schaepe, S; Schaetzel, S; Schäfer, U; Schaffer, A C; Schaile, D; Schamberger, R D; Scharf, V; Schegelsky, V A; Scheirich, D; Schernau, M; Schiavi, C; Schier, S; Schildgen, L K; Schillo, C; Schioppa, M; Schlenker, S; Schmidt-Sommerfeld, K R; Schmieden, K; Schmitt, C; Schmitt, S; Schmitz, S; Schnoor, U; Schoeffel, L; Schoening, A; Schoenrock, B D; Schopf, E; Schott, M; Schouwenberg, J F P; Schovancova, J; Schramm, S; Schuh, N; Schulte, A; Schultens, M J; Schultz-Coulon, H-C; Schulz, H; Schumacher, M; Schumm, B A; Schune, Ph; Schwartzman, A; Schwarz, T A; Schweiger, H; Schwemling, Ph; Schwienhorst, R; Schwindling, J; Sciandra, A; Sciolla, G; Scuri, F; Scutti, F; Searcy, J; Seema, P; Seidel, S C; Seiden, A; Seixas, J M; Sekhniaidze, G; Sekhon, K; Sekula, S J; Semprini-Cesari, N; Senkin, S; Serfon, C; Serin, L; Serkin, L; Sessa, M; Seuster, R; Severini, H; Sfiligoj, T; Sforza, F; Sfyrla, A; Shabalina, E; Shaikh, N W; Shan, L Y; Shang, R; Shank, J T; Shapiro, M; Shatalov, P B; Shaw, K; Shaw, S M; Shcherbakova, A; Shehu, C Y; Shen, Y; Sherafati, N; Sherwood, P; Shi, L; Shimizu, S; Shimmin, C O; Shimojima, M; Shipsey, I P J; Shirabe, S; Shiyakova, M; Shlomi, J; Shmeleva, A; Shoaleh Saadi, D; Shochet, M J; Shojaii, S; Shope, D R; Shrestha, S; Shulga, E; Shupe, M A; Sicho, P; Sickles, A M; Sidebo, P E; Sideras Haddad, E; Sidiropoulou, O; Sidoti, A; Siegert, F; Sijacki, Dj; Silva, J; Silverstein, S B; Simak, V; Simic, Lj; Simion, S; Simioni, E; Simmons, B; Simon, M; Sinervo, P; Sinev, N B; Sioli, M; Siragusa, G; Siral, I; Sivoklokov, S Yu; Sjölin, J; Skinner, M B; Skubic, P; Slater, M; Slavicek, T; Slawinska, M; Sliwa, K; Slovak, R; Smakhtin, V; Smart, B H; Smiesko, J; Smirnov, N; Smirnov, S Yu; Smirnov, Y; Smirnova, L N; Smirnova, O; Smith, J W; Smith, M N K; Smith, R W; Smizanska, M; Smolek, K; Snesarev, A A; Snyder, I M; Snyder, S; Sobie, R; Socher, F; Soffer, A; Soh, D A; Sokhrannyi, G; Solans Sanchez, C A; Solar, M; Soldatov, E Yu; Soldevila, U; Solodkov, A A; Soloshenko, A; Solovyanov, O V; Solovyev, V; Sommer, P; Son, H; Sopczak, A; Sosa, D; Sotiropoulou, C L; Soualah, R; Soukharev, A M; South, D; Sowden, B C; Spagnolo, S; Spalla, M; Spangenberg, M; Spanò, F; Sperlich, D; Spettel, F; Spieker, T M; Spighi, R; Spigo, G; Spiller, L A; Spousta, M; St Denis, R D; Stabile, A; Stamen, R; Stamm, S; Stanecka, E; Stanek, R W; Stanescu, C; Stanitzki, M M; Stapf, B S; Stapnes, S; Starchenko, E A; Stark, G H; Stark, J; Stark, S H; Staroba, P; Starovoitov, P; Stärz, S; Staszewski, R; Steinberg, P; Stelzer, B; Stelzer, H J; Stelzer-Chilton, O; Stenzel, H; Stewart, G A; Stockton, M C; Stoebe, M; Stoicea, G; Stolte, P; Stonjek, S; Stradling, A R; Straessner, A; Stramaglia, M E; Strandberg, J; Strandberg, S; Strauss, M; Strizenec, P; Ströhmer, R; Strom, D M; Stroynowski, R; Strubig, A; Stucci, S A; Stugu, B; Styles, N A; Su, D; Su, J; Suchek, S; Sugaya, Y; Suk, M; Sulin, V V; Sultan, D M S; Sultansoy, S; Sumida, T; Sun, S; Sun, X; Suruliz, K; Suster, C J E; Sutton, M R; Suzuki, S; Svatos, M; Swiatlowski, M; Swift, S P; Sykora, I; Sykora, T; Ta, D; Tackmann, K; Taenzer, J; Taffard, A; Tafirout, R; Taiblum, N; Takai, H; Takashima, R; Takasugi, E H; Takeshita, T; Takubo, Y; Talby, M; Talyshev, A A; Tanaka, J; Tanaka, M; Tanaka, R; Tanaka, S; Tanioka, R; Tannenwald, B B; Tapia Araya, S; Tapprogge, S; Tarem, S; Tartarelli, G F; Tas, P; Tasevsky, M; Tashiro, T; Tassi, E; Tavares Delgado, A; Tayalati, Y; Taylor, A C; Taylor, G N; Taylor, P T E; Taylor, W; Teixeira-Dias, P; Temple, D; Ten Kate, H; Teng, P K; Teoh, J J; Tepel, F; Terada, S; Terashi, K; Terron, J; Terzo, S; Testa, M; Teuscher, R J; Theveneaux-Pelzer, T; Thomas, J P; Thomas-Wilsker, J; Thompson, P D; Thompson, A S; Thomsen, L A; Thomson, E; Tibbetts, M J; Ticse Torres, R E; Tikhomirov, V O; Tikhonov, Yu A; Timoshenko, S; Tipton, P; Tisserant, S; Todome, K; Todorova-Nova, S; Tojo, J; Tokár, S; Tokushuku, K; Tolley, E; Tomlinson, L; Tomoto, M; Tompkins, L; Toms, K; Tong, B; Tornambe, P; Torrence, E; Torres, H; Torró Pastor, E; Toth, J; Touchard, F; Tovey, D R; Treado, C J; Trefzger, T; Tresoldi, F; Tricoli, A; Trigger, I M; Trincaz-Duvoid, S; Tripiana, M F; Trischuk, W; Trocmé, B; Trofymov, A; Troncon, C; Trottier-McDonald, M; Trovatelli, M; Truong, L; Trzebinski, M; Trzupek, A; Tsang, K W; Tseng, J C-L; Tsiareshka, P V; Tsipolitis, G; Tsirintanis, N; Tsiskaridze, S; Tsiskaridze, V; Tskhadadze, E G; Tsui, K M; Tsukerman, I I; Tsulaia, V; Tsuno, S; Tsybychev, D; Tu, Y; Tudorache, A; Tudorache, V; Tulbure, T T; Tuna, A N; Tupputi, S A; Turchikhin, S; Turgeman, D; Turk Cakir, I; Turra, R; Tuts, P M; Ucchielli, G; Ueda, I; Ughetto, M; Ukegawa, F; Unal, G; Undrus, A; Unel, G; Ungaro, F C; Unno, Y; Unverdorben, C; Urban, J; Urquijo, P; Urrejola, P; Usai, G; Usui, J; Vacavant, L; Vacek, V; Vachon, B; Valderanis, C; Valdes Santurio, E; Valentinetti, S; Valero, A; Valéry, L; Valkar, S; Vallier, A; Valls Ferrer, J A; Van Den Wollenberg, W; van der Graaf, H; van Gemmeren, P; Van Nieuwkoop, J; van Vulpen, I; van Woerden, M C; Vanadia, M; Vandelli, W; Vaniachine, A; Vankov, P; Vardanyan, G; Vari, R; Varnes, E W; Varni, C; Varol, T; Varouchas, D; Vartapetian, A; Varvell, K E; Vasquez, J G; Vasquez, G A; Vazeille, F; Vazquez Schroeder, T; Veatch, J; Veeraraghavan, V; Veloce, L M; Veloso, F; Veneziano, S; Ventura, A; Venturi, M; Venturi, N; Venturini, A; Vercesi, V; Verducci, M; Verkerke, W; Vermeulen, A T; Vermeulen, J C; Vetterli, M C; Viaux Maira, N; Viazlo, O; Vichou, I; Vickey, T; Vickey Boeriu, O E; Viehhauser, G H A; Viel, S; Vigani, L; Villa, M; Villaplana Perez, M; Vilucchi, E; Vincter, M G; Vinogradov, V B; Vishwakarma, A; Vittori, C; Vivarelli, I; Vlachos, S; Vlasak, M; Vogel, M; Vokac, P; Volpi, G; von der Schmitt, H; von Toerne, E; Vorobel, V; Vorobev, K; Vos, M; Voss, R; Vossebeld, J H; Vranjes, N; Vranjes Milosavljevic, M; Vrba, V; Vreeswijk, M; Vuillermet, R; Vukotic, I; Wagner, P; Wagner, W; Wagner-Kuhr, J; Wahlberg, H; Wahrmund, S; Wakabayashi, J; Walder, J; Walker, R; Walkowiak, W; Wallangen, V; Wang, C; Wang, C; Wang, F; Wang, H; Wang, H; Wang, J; Wang, J; Wang, Q; Wang, R; Wang, S M; Wang, T; Wang, W; Wang, W; Wang, Z; Wanotayaroj, C; Warburton, A; Ward, C P; Wardrope, D R; Washbrook, A; Watkins, P M; Watson, A T; Watson, M F; Watts, G; Watts, S; Waugh, B M; Webb, A F; Webb, S; Weber, M S; Weber, S W; Weber, S A; Webster, J S; Weidberg, A R; Weinert, B; Weingarten, J; Weirich, M; Weiser, C; Weits, H; Wells, P S; Wenaus, T; Wengler, T; Wenig, S; Wermes, N; Werner, M D; Werner, P; Wessels, M; Whalen, K; Whallon, N L; Wharton, A M; White, A S; White, A; White, M J; White, R; Whiteson, D; Whitmore, B W; Wickens, F J; Wiedenmann, W; Wielers, M; Wiglesworth, C; Wiik-Fuchs, L A M; Wildauer, A; Wilk, F; Wilkens, H G; Williams, H H; Williams, S; Willis, C; Willocq, S; Wilson, J A; Wingerter-Seez, I; Winkels, E; Winklmeier, F; Winston, O J; Winter, B T; Wittgen, M; Wobisch, M; Wolf, T M H; Wolff, R; Wolter, M W; Wolters, H; Wong, V W S; Worm, S D; Wosiek, B K; Wotschack, J; Wozniak, K W; Wu, M; Wu, S L; Wu, X; Wu, Y; Wyatt, T R; Wynne, B M; Xella, S; Xi, Z; Xia, L; Xu, D; Xu, L; Yabsley, B; Yacoob, S; Yamaguchi, D; Yamaguchi, Y; Yamamoto, A; Yamamoto, S; Yamanaka, T; Yamatani, M; Yamauchi, K; Yamazaki, Y; Yan, Z; Yang, H; Yang, H; Yang, Y; Yang, Z; Yao, W-M; Yap, Y C; Yasu, Y; Yatsenko, E; Yau Wong, K H; Ye, J; Ye, S; Yeletskikh, I; Yigitbasi, E; Yildirim, E; Yorita, K; Yoshihara, K; Young, C; Young, C J S; Yu, J; Yu, J; Yuen, S P Y; Yusuff, I; Zabinski, B; Zacharis, G; Zaidan, R; Zaitsev, A M; Zakharchuk, N; Zalieckas, J; Zaman, A; Zambito, S; Zanzi, D; Zeitnitz, C; Zemla, A; Zeng, J C; Zeng, Q; Zenin, O; Ženiš, T; Zerwas, D; Zhang, D; Zhang, F; Zhang, G; Zhang, H; Zhang, J; Zhang, L; Zhang, L; Zhang, M; Zhang, P; Zhang, R; Zhang, R; Zhang, X; Zhang, Y; Zhang, Z; Zhao, X; Zhao, Y; Zhao, Z; Zhemchugov, A; Zhou, B; Zhou, C; Zhou, L; Zhou, M; Zhou, M; Zhou, N; Zhu, C G; Zhu, H; Zhu, J; Zhu, Y; Zhuang, X; Zhukov, K; Zibell, A; Zieminska, D; Zimine, N I; Zimmermann, C; Zimmermann, S; Zinonos, Z; Zinser, M; Ziolkowski, M; Živković, L; Zobernig, G; Zoccoli, A; Zou, R; Zur Nedden, M; Zwalinski, L

2017-01-01

Results of a search for physics beyond the Standard Model in events containing an energetic photon and large missing transverse momentum with the ATLAS detector at the Large Hadron Collider are reported. As the number of events observed in data, corresponding to an integrated luminosity of 36.1 fb[Formula: see text] of proton-proton collisions at a centre-of-mass energy of [Formula: see text], is in agreement with the Standard Model expectations, model-independent limits are set on the fiducial cross section for the production of events in this final state. Exclusion limits are also placed in models where dark-matter candidates are pair-produced. For dark-matter production via an axial-vector or a vector mediator in the s -channel, this search excludes mediator masses below 750-[Formula: see text] for dark-matter candidate masses below 230-[Formula: see text] at 95% confidence level, depending on the couplings. In an effective theory of dark-matter production, the limits restrict the value of the suppression scale [Formula: see text] to be above [Formula: see text] at 95% confidence level. A limit is also reported on the production of a high-mass scalar resonance by processes beyond the Standard Model, in which the resonance decays to [Formula: see text] and the Z boson subsequently decays into neutrinos.
Recognition techniques for extracting information from semistructured documents

Science.gov (United States)

Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna

2000-12-01

Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
Theoretical and Practical Aspects of Logistic Quality Management System Documentation Development Process

Directory of Open Access Journals (Sweden)

Linas Šaulinskas

2013-12-01

Full Text Available This paper addresses aspects of logistics quality management system documentation development and suggests models for quality management system documentation development, documentation hierarchical systems and authorization approval. It also identifies logistic processes and a responsibilities model and a detailed document development and approval process that can be practically applied. Our results are based upon an analysis of advanced Lithuanian and foreign corporate business practices, a review of current literature and recommendations for quality management system standards.
Documents and legal texts: Australia, Germany, Sweden

International Nuclear Information System (INIS)

Anon.

2012-01-01

Australia: National Radioactive Waste Management Act 2012 No. 29, 2012 (An Act to make provision in relation to the selection of a site for, and the establishment and operation of, a radioactive waste management facility, and for related purposes). Germany: Act on the Peaceful Utilisation of Atomic Energy and the Protection against its Hazards (Atomic Energy Act) of 23 December 1959, as amended and promulgated on 15 July 1985, last amendment by the Act of 8 November 2011. Sweden: The Swedish Radiation Safety Authority's regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (Swedish Radiation Safety Authority Regulatory Code issued on 20 October 2011, Published on 2 November 2011); The Swedish Radiation Safety Authority's general advice on the application of the regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (issued on 20 October 2011)
Invisible in Thailand: documenting the need for protection

Directory of Open Access Journals (Sweden)

Margaret Green

2008-04-01

Full Text Available The International Rescue Committee (IRC has conducted asurvey to document the experiences of Burmese people livingin border areas of Thailand and assess the degree to whichthey merit international protection as refugees.
The Digital Administrative Document: an approximate path

Directory of Open Access Journals (Sweden)

Francesca Delneri

2017-09-01

Full Text Available If the road towards a progressive dematerialization of the administrative document is marked, the legislator is not always proceeding in a coherent, clear or complete way. The line of reasoning needs to be focused on the administration of documental heritage, training and preservation rather than on technological issues, involving actively local administration and taking on responsibilities on decisions, also on the relationship between costs and benefits. The way is hard due to the lack of debate and practical directions, where the preservation should not be considered as a commanding confirmation, but as an occasion to face complex and critical issues.
Provable Fair Document Exchange Protocol with Transaction Privacy for E-Commerce

Directory of Open Access Journals (Sweden)

Ren-Junn Hwang

2015-04-01

Full Text Available Transaction privacy has attracted a lot of attention in the e-commerce. This study proposes an efficient and provable fair document exchange protocol with transaction privacy. Using the proposed protocol, any untrusted parties can fairly exchange documents without the assistance of online, trusted third parties. Moreover, a notary only notarizes each document once. The authorized document owner can exchange a notarized document with different parties repeatedly without disclosing the origin of the document or the identities of transaction participants. Security and performance analyses indicate that the proposed protocol not only provides strong fairness, non-repudiation of origin, non-repudiation of receipt, and message confidentiality, but also enhances forward secrecy, transaction privacy, and authorized exchange. The proposed protocol is more efficient than other works.
文件物件模型及其在XML文件處理之應用 Document Object Model and Its Application on XML Document Processing

Directory of Open Access Journals (Sweden)

Sinn-cheng Lin

2001-06-01

Full Text Available 無Document Object Model (DOM is an application-programming interface that can be applied to process XML documents. It defines the logical structure, the accessing interfaces and the operation methods for the document. In the DOM, an original document is mapped to a tree structure. Therefore ，the computer program can easily traverse the tree manipulate the nodes in the tree. In this paper, the fundamental models, definitions and specifications of DOM are surveyed. Then we create an experimenta1 system of DOM called XML On-Line Parser. The front-end of the system is built by the Web-based user interface for the XML document input and the parsed result output. On the other hand, the back-end of the system is built by an ASP program, which transforms the original document to DOM tree for document manipulation. This on-line system can be linked with a general-purpose web browser to check the well-formedness and the validity of the XML documents.

The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements; Textes de l'Accord Relatif au Siege Conclu Entre l'Agence et l'Autriche et d'Accords Connexes

Energy Technology Data Exchange (ETDEWEB)

NONE

1975-12-16

The texts of the Agreement between the International Atomic Energy Agency and the Republic of Austria that were in force on 30 September 1975 are reproduced in this document for the information of all Members of the Agency [French] Les textes de l'Accord relatif au siege conclu entre l'Agence et la Republique d'Autriche et de divers accords connexes, qui etaient en vigueur le 30 septembre 1975, sont reproduits dans le present document pour l'information de tous les Membres de l'Agence.
The Medline/full-text research project.

Science.gov (United States)

McKinin, E J; Sievert, M; Johnson, E D; Mitchell, J A

1991-05-01

This project was designed to test the relative efficacy of index terms and full-text for the retrieval of documents in those MEDLINE journals for which full-text searching was also available. The full-text files used were MEDIS from Mead Data Central and CCML from BRS Information Technologies. One hundred clinical medical topics were searched in these two files as well as the MEDLINE file to accumulate the necessary data. It was found that full-text identified significantly more relevant articles than did the indexed file, MEDLINE. The full-text searches, however, lacked the precision of searches done in the indexed file. Most relevant items missed in the full-text files, but identified in MEDLINE, were missed because the searcher failed to account for some aspect of natural language, used a logical or positional operator that was too restrictive, or included a concept which was implied, but not expressed in the natural language. Very few of the unique relevant full-text citations would have been retrieved by title or abstract alone. Finally, as of July, 1990 the more current issue of a journal was just as likely to appear in MEDLINE as in one of the full-text files.
Computerising documentation

International Nuclear Information System (INIS)

Anon.

1992-01-01

The nuclear power generation industry is faced with public concern and government pressures over safety, efficiency and risk. Operators throughout the industry are addressing these issues with the aid of a new technology - technical document management systems (TDMS). Used for strategic and tactical advantage, the systems enable users to scan, archive, retrieve, store, edit, distribute worldwide and manage the huge volume of documentation (paper drawings, CAD data and film-based information) generated in building, maintaining and ensuring safety in the UK's power plants. The power generation industry has recognized that the management and modification of operation critical information is vital to the safety and efficiency of its power plants. Regulatory pressure from the Nuclear Installations Inspectorate (NII) to operate within strict safety margins or lose Site Licences has prompted the need for accurate, up-to-data documentation. A document capture and management retrieval system provides a powerful cost-effective solution, giving rapid access to documentation in a tightly controlled environment. The computerisation of documents and plans is discussed in this article. (Author)
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities

DEFF Research Database (Denmark)

Petersen, Casper; Lioma, Christina; Simonsen, Jakob Grue

2015-01-01

We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse...... entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28......] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse...
Endangered Language Documentation and Transmission

Directory of Open Access Journals (Sweden)

D. Victoria Rau

2007-01-01

Full Text Available This paper describes an on-going project on digital archiving Yami language documentation (http://www.hrelp.org/grants/projects/index.php?projid=60. We present a cross-disciplinary approach, involving computer science and applied linguistics, to document the Yami language and prepare teaching materials. Our discussion begins with an introduction to an integrated framework for archiving, processing and developing learning materials for Yami (Yang and Rau 2005, followed by a historical account of Yami language teaching, from a grammatical syllabus (Dong and Rau 2000b to a communicative syllabus using a multimedia CD as a resource (Rau et al. 2005, to the development of interactive on-line learning based on the digital archiving project. We discuss the methods used and challenges of each stage of preparing Yami teaching materials, and present a proposal for rethinking pedagogical models for e-learning.
Signal Detection Framework Using Semantic Text Mining Techniques

Science.gov (United States)

Sudarsan, Sithu D.

2009-01-01

Signal detection is a challenging task for regulatory and intelligence agencies. Subject matter experts in those agencies analyze documents, generally containing narrative text in a time bound manner for signals by identification, evaluation and confirmation, leading to follow-up action e.g., recalling a defective product or public advisory for…
Methodological Aspects of Architectural Documentation

Directory of Open Access Journals (Sweden)

Arivaldo Amorim

2011-12-01

Full Text Available This paper discusses the methodological approach that is being developed in the state of Bahia in Brazil since 2003, in architectural and urban sites documentation, using extensive digital technologies. Bahia has a vast territory with important architectural ensembles ranging from the sixteenth century to present day. As part of this heritage is constructed of raw earth and wood, it is very sensitive to various deleterious agents. It is therefore critical document this collection that is under threats. To conduct those activities diverse digital technologies that could be used in documentation process are being experimented. The task is being developed as an academic research, with few financial resources, by scholarship students and some volunteers. Several technologies are tested ranging from the simplest to the more sophisticated ones, used in the main stages of the documentation project, as follows: work overall planning, data acquisition, processing and management and ultimately, to control and evaluate the work. The activities that motivated this paper are being conducted in the cities of Rio de Contas and Lençóis in the Chapada Diamantina, located at 420 km and 750 km from Salvador respectively, in Cachoeira city at Recôncavo Baiano area, 120 km from Salvador, the capital of Bahia state, and at Pelourinho neighbourhood, located in the historic capital. Part of the material produced can be consulted in the website: < www.lcad.ufba.br>.
Project Documentation as a Risk for Public Projects

Directory of Open Access Journals (Sweden)

Vladěna Štěpánková

2015-08-01

Full Text Available Purpose of the article: The paper presents the different methodologies used for creating documentation and focuses on public projects and their requirements for this documentation. Since documentation is also incorporated in the overall planning of the project and its duration is estimated using expert qualified estimate, can any change in this documentation lead to project delays, or increase its cost as a result of consuming administration, and therefore the documentation is seen as a risk, which may threaten the project as a public contract by which a company trying to achieve and obtains it, and generally any project. Methodology/methods: There are used methods of obtaining information in this paper. These are mainly structured interviews in combination with a brainstorming, furthermore also been used questionnaire for companies dealing with public procurement. As a data processing program was used MS Excel and basic statistical methods based on regression analysis. Scientific aim: The article deals with the construction market in the Czech Republic and examines the impact of changes in project documentation of public projects on their turnover. Findings: In this paper we summarize the advantages and disadvantages of having project documentation. In the case of public contracts and changes in legislation it is necessary to focus on creating documentation in advance, follow the new requirements and try to reach them in the shortest possible time. Conclusions: The paper concludes with recommendations on how to proceed, if these changes and how to reduce costs, which may cause the risk of documentation.
Pengembangan Electronic Document Management System (EDMS) Sebagai Alternatif Pengarsipan Di Perguruan Tinggi

OpenAIRE

Amin, M. Miftakul

2010-01-01

The purpose of this paper is to develop an electronic document management system (EDMS). With a system is expected to be an alternative in the management of electronic documents in the college environment. Writing this using qualitative research approach with observation, document analysis, and interviews for data collection process. The system used a web-based system that is expected to reach the users of the system at large. This system has the functionality to store, archive, and retrieve...
Identification of documented medication non-adherence in physician notes.

Science.gov (United States)

Turchin, Alexander; Wheeler, Holly I; Labreche, Matthew; Chu, Julia T; Pendergrass, Merri L; Einbinder, Jonathan S; Einbinder, Jonathan Seth

2008-11-06

Medication non-adherence is common and the physicians awareness of it may be an important factor in clinical decision making. Few sources of data on physician awareness of medication non-adherence are available. We have designed an algorithm to identify documentation of medication non-adherence in the text of physician notes. The algorithm recognizes eight semantic classes of documentation of medication non-adherence. We evaluated the algorithm against manual ratings of 200 randomly selected notes of hypertensive patients. The algorithm detected 89% of the notes with documented medication non-adherence with specificity of 84.7% and positive predictive value of 80.2%. In a larger dataset of 1,000 documents, notes that documented medication non-adherence were more likely to report significantly elevated systolic (15.3% vs. 9.0%; p = 0.002) and diastolic (4.1% vs. 1.9%; p = 0.03) blood pressure. This novel clinically validated tool expands the range of information on medication non-adherence available to researchers.
Document Models

Directory of Open Access Journals (Sweden)

A.A. Malykh

2017-08-01

Full Text Available In this paper, the concept of locally simple models is considered. Locally simple models are arbitrarily complex models built from relatively simple components. A lot of practically important domains of discourse can be described as locally simple models, for example, business models of enterprises and companies. Up to now, research in human reasoning automation has been mainly concentrated around the most intellectually intensive activities, such as automated theorem proving. On the other hand, the retailer business model is formed from ”jobs”, and each ”job” can be modelled and automated more or less easily. At the same time, the whole retailer model as an integrated system is extremely complex. In this paper, we offer a variant of the mathematical definition of a locally simple model. This definition is intended for modelling a wide range of domains. Therefore, we also must take into account the perceptual and psychological issues. Logic is elitist, and if we want to attract to our models as many people as possible, we need to hide this elitism behind some metaphor, to which ’ordinary’ people are accustomed. As such a metaphor, we use the concept of a document, so our locally simple models are called document models. Document models are built in the paradigm of semantic programming. This allows us to achieve another important goal - to make the documentary models executable. Executable models are models that can act as practical information systems in the described domain of discourse. Thus, if our model is executable, then programming becomes redundant. The direct use of a model, instead of its programming coding, brings important advantages, for example, a drastic cost reduction for development and maintenance. Moreover, since the model is well and sound, and not dissolved within programming modules, we can directly apply AI tools, in particular, machine learning. This significantly expands the possibilities for automation and
Clinical map document based on XML (cMDX: document architecture with mapping feature for reporting and analysing prostate cancer in radical prostatectomy specimens

Directory of Open Access Journals (Sweden)

Bettendorf Olaf

2010-11-01

Full Text Available Abstract Background The pathology report of radical prostatectomy specimens plays an important role in clinical decisions and the prognostic evaluation in Prostate Cancer (PCa. The anatomical schema is a helpful tool to document PCa extension for clinical and research purposes. To achieve electronic documentation and analysis, an appropriate documentation model for anatomical schemas is needed. For this purpose we developed cMDX. Methods The document architecture of cMDX was designed according to Open Packaging Conventions by separating the whole data into template data and patient data. Analogue custom XML elements were considered to harmonize the graphical representation (e.g. tumour extension with the textual data (e.g. histological patterns. The graphical documentation was based on the four-layer visualization model that forms the interaction between different custom XML elements. Sensible personal data were encrypted with a 256-bit cryptographic algorithm to avoid misuse. In order to assess the clinical value, we retrospectively analysed the tumour extension in 255 patients after radical prostatectomy. Results The pathology report with cMDX can represent pathological findings of the prostate in schematic styles. Such reports can be integrated into the hospital information system. "cMDX" documents can be converted into different data formats like text, graphics and PDF. Supplementary tools like cMDX Editor and an analyser tool were implemented. The graphical analysis of 255 prostatectomy specimens showed that PCa were mostly localized in the peripheral zone (Mean: 73% ± 25. 54% of PCa showed a multifocal growth pattern. Conclusions cMDX can be used for routine histopathological reporting of radical prostatectomy specimens and provide data for scientific analysis.
A Study of Readability of Texts in Bangla through Machine Learning Approaches

Science.gov (United States)

Sinha, Manjira; Basu, Anupam

2016-01-01

In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic…
Robust binarization of degraded document images using heuristics

Science.gov (United States)

Parker, Jon; Frieder, Ophir; Frieder, Gideon

2013-12-01

Historically significant documents are often discovered with defects that make them difficult to read and analyze. This fact is particularly troublesome if the defects prevent software from performing an automated analysis. Image enhancement methods are used to remove or minimize document defects, improve software performance, and generally make images more legible. We describe an automated, image enhancement method that is input page independent and requires no training data. The approach applies to color or greyscale images with hand written script, typewritten text, images, and mixtures thereof. We evaluated the image enhancement method against the test images provided by the 2011 Document Image Binarization Contest (DIBCO). Our method outperforms all 2011 DIBCO entrants in terms of average F1 measure - doing so with a significantly lower variance than top contest entrants. The capability of the proposed method is also illustrated using select images from a collection of historic documents stored at Yad Vashem Holocaust Memorial in Israel.
Cat swarm optimization based evolutionary framework for multi document summarization

Science.gov (United States)

Rautray, Rasmita; Balabantaray, Rakesh Chandra

2017-07-01

Today, World Wide Web has brought us enormous quantity of on-line information. As a result, extracting relevant information from massive data has become a challenging issue. In recent past text summarization is recognized as one of the solution to extract useful information from vast amount documents. Based on number of documents considered for summarization, it is categorized as single document or multi document summarization. Rather than single document, multi document summarization is more challenging for the researchers to find accurate summary from multiple documents. Hence in this study, a novel Cat Swarm Optimization (CSO) based multi document summarizer is proposed to address the problem of multi document summarization. The proposed CSO based model is also compared with two other nature inspired based summarizer such as Harmony Search (HS) based summarizer and Particle Swarm Optimization (PSO) based summarizer. With respect to the benchmark Document Understanding Conference (DUC) datasets, the performance of all algorithms are compared in terms of different evaluation metrics such as ROUGE score, F score, sensitivity, positive predicate value, summary accuracy, inter sentence similarity and readability metric to validate non-redundancy, cohesiveness and readability of the summary respectively. The experimental analysis clearly reveals that the proposed approach outperforms the other summarizers included in the study.
Segmentation-driven compound document coding based on H.264/AVC-INTRA.

Science.gov (United States)

Zaghetto, Alexandre; de Queiroz, Ricardo L

2007-07-01

In this paper, we explore H.264/AVC operating in intraframe mode to compress a mixed image, i.e., composed of text, graphics, and pictures. Even though mixed contents (compound) documents usually require the use of multiple compressors, we apply a single compressor for both text and pictures. For that, distortion is taken into account differently between text and picture regions. Our approach is to use a segmentation-driven adaptation strategy to change the H.264/AVC quantization parameter on a macroblock by macroblock basis, i.e., we deviate bits from pictorial regions to text in order to keep text edges sharp. We show results of a segmentation driven quantizer adaptation method applied to compress documents. Our reconstructed images have better text sharpness compared to straight unadapted coding, at negligible visual losses on pictorial regions. Our results also highlight the fact that H.264/AVC-INTRA outperforms coders such as JPEG-2000 as a single coder for compound images.
Pengembangan Algoritma Fast Inversion dalam Membentuk Inverted File untuk Text Retrieval dengan Data Skala Besar

Directory of Open Access Journals (Sweden)

Derwin Suhartono

2012-06-01

Full Text Available The rapid development of information systems generates new needs for indexing and retrieval of various kinds of media. The need for documents in the form of multimedia is increasing currently. Thus, the need to store or retrieve now becomes a primary problem. The multimedia type commonly used is text types, as widely seen as the main option in the search engines like Yahoo, Google or others. Essentially, search does not just want to get results, but also a more efficient process. For the purposes of indexing and retrieval, inverted file is used to provide faster results. However, there will be a problem if the making of an inverted file is related to a large amount of data. This study describes an algorithm called Fast Inversion as the development of base inverted file making method to address the needs related to the amount of data.
Analysis of Documentation Speed Using Web-Based Medical Speech Recognition Technology: Randomized Controlled Trial.

Science.gov (United States)

Vogel, Markus; Kaisers, Wolfgang; Wassmuth, Ralf; Mayatepek, Ertan

2015-11-03

Clinical documentation has undergone a change due to the usage of electronic health records. The core element is to capture clinical findings and document therapy electronically. Health care personnel spend a significant portion of their time on the computer. Alternatives to self-typing, such as speech recognition, are currently believed to increase documentation efficiency and quality, as well as satisfaction of health professionals while accomplishing clinical documentation, but few studies in this area have been published to date. This study describes the effects of using a Web-based medical speech recognition system for clinical documentation in a university hospital on (1) documentation speed, (2) document length, and (3) physician satisfaction. Reports of 28 physicians were randomized to be created with (intervention) or without (control) the assistance of a Web-based system of medical automatic speech recognition (ASR) in the German language. The documentation was entered into a browser's text area and the time to complete the documentation including all necessary corrections, correction effort, number of characters, and mood of participant were stored in a database. The underlying time comprised text entering, text correction, and finalization of the documentation event. Participants self-assessed their moods on a scale of 1-3 (1=good, 2=moderate, 3=bad). Statistical analysis was done using permutation tests. The number of clinical reports eligible for further analysis stood at 1455. Out of 1455 reports, 718 (49.35%) were assisted by ASR and 737 (50.65%) were not assisted by ASR. Average documentation speed without ASR was 173 (SD 101) characters per minute, while it was 217 (SD 120) characters per minute using ASR. The overall increase in documentation speed through Web-based ASR assistance was 26% (P=.04). Participants documented an average of 356 (SD 388) characters per report when not assisted by ASR and 649 (SD 561) characters per report when assisted
ON EXPERIENCE OF THE ELECTRONIC DOCUMENT MANAGEMENT SYSTEM IMPLEMENTATION IN THE MEDICAL UNIVERSITY

Directory of Open Access Journals (Sweden)

A. V. Semenets

2015-05-01

Full Text Available An importance of the application of the electronic document management to the Ukraine healthcare is shown. The electronic document management systems market overview is presented. Example of the usage of the open-source electronic document management system in the Ternopil State Medical University by I. Ya. Horbachevsky is shown. The implementation capabilities of the electronic document management system within a cloud services are shown. The electronic document management features of the Microsoft Office 365 and Google Apps For Education are compared. Some results of the usage of the Google Apps For Education inTSMUas electronic document management system are presented.
Digitization of Full-Text Documents Before Publishing on the Internet: A Case Study Reviewing the Latest Optical Character Recognition Technologies.

Science.gov (United States)

McClean, Clare M.

1998-01-01

Reviews strengths and weaknesses of five optical character recognition (OCR) software packages used to digitize paper documents before publishing on the Internet. Outlines options available and stages of the conversion process. Describes the learning experience of Eurotext, a United Kingdom-based electronic libraries project (eLib). (PEN)

Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.

Science.gov (United States)

Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald

2001-01-01

Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…
From Medieval Philosophy to the Virtual Library: a descriptive framework for scientific knowledge and documentation as basis for document retrieval

Directory of Open Access Journals (Sweden)

Frances Morrissey

2001-11-01

Full Text Available This paper examines the conceptual basis of document retrieval systems for the Virtual Library in science and technology. It does so through analysing some cognitive models for scientific knowledge, drawing on philosophy, sociology and linguistics. It is important to consider improvements in search/ retrieval functionalities for scientific documents because knowledge creation and transfer are integral to the functioning of scientific communities, and on a larger scale, science and technology are central to the knowledge economy. This paper proposes four new and innovative understandings. Firstly, it is proposed that formal scientific communication constitutes the documentation and dissemination of concepts, and that conceptualism is a useful philosophical basis for study. Second, it is proposed that the scientific document is a dyadic con-struct, being both the physical manifestation as an encoded medium, and also being the associated knowledge, or intangible ideation, that is carried within the document. Third, it is shown that major philosophers of science divide science into three main activities, dealing with data, derived or inferred laws, and the axioms or the paradigm. Fourth, it is demonstrated that the data, information and conceptual frameworks carried by a scientific document, as different levels of signification or semiotic systems, can each be characterised in ways assisting in search and retrieval functionalities for the Virtual Library.
Investigating scientific literacy documents with linguistic network analysis

DEFF Research Database (Denmark)

Bruun, Jesper; Evans, Robert Harry; Dolin, Jens

2009-01-01

International discussions of scientific literacy (SL) are extensive and numerous sizeable documents on SL exist. Thus, comparing different conceptions of SL is methodologically challenging. We developed an analytical tool which couples the theory of complex networks with text analysis in order...
Documentation Service

International Nuclear Information System (INIS)

Charnay, J.; Chosson, L.; Croize, M.; Ducloux, A.; Flores, S.; Jarroux, D.; Melka, J.; Morgue, D.; Mottin, C.

1998-01-01

This service assures the treatment and diffusion of the scientific information and the management of the scientific production of the institute as well as the secretariat operation for the groups and services of the institute. The report on documentation-library section mentions: the management of the documentation funds, search in international databases (INIS, Current Contents, Inspects), Pret-Inter service which allows accessing documents through DEMOCRITE network of IN2P3. As realizations also mentioned are: the setup of a video, photo database, the Web home page of the institute's library, follow-up of digitizing the document funds by integrating the CD-ROMs and diskettes, electronic archiving of the scientific production, etc
Document image binarization using "multi-scale" predefined filters

Science.gov (United States)

Saabni, Raid M.

2018-04-01

Reading text or searching for key words within a historical document is a very challenging task. one of the first steps of the complete task is binarization, where we separate foreground such as text, figures and drawings from the background. Successful results of this important step in many cases can determine next steps to success or failure, therefore it is very vital to the success of the complete task of reading and analyzing the content of a document image. Generally, historical documents images are of poor quality due to their storage condition and degradation over time, which mostly cause to varying contrasts, stains, dirt and seeping ink from reverse side. In this paper, we use banks of anisotropic predefined filters in different scales and orientations to develop a binarization method for degraded documents and manuscripts. Using the fact, that handwritten strokes may follow different scales and orientations, we use predefined sets of filter banks having various scales, weights, and orientations to seek a compact set of filters and weights in order to generate diffrent layers of foregrounds and background. Results of convolving these fiters on the gray level image locally, weighted and accumulated to enhance the original image. Based on the different layers, seeds of components in the gray level image and a learning process, we present an improved binarization algorithm to separate the background from layers of foreground. Different layers of foreground which may be caused by seeping ink, degradation or other factors are also separated from the real foreground in a second phase. Promising experimental results were obtained on the DIBCO2011 , DIBCO2013 and H-DIBCO2016 data sets and a collection of images taken from real historical documents.
Text-Filled Stacked Area Graphs

DEFF Research Database (Denmark)

Kraus, Martin

2011-01-01

-filled stacked area graphs; i.e., graphs that feature stacked areas that are filled with small-typed text. Since these graphs allow for computing the text layout automatically, it is possible to include large amounts of textual detail with very little effort. We discuss the most important challenges and some...... solutions for the design of text-filled stacked area graphs with the help of an exemplary visualization of the genres, publication years, and titles of a database of several thousand PC games....
Using anchor text, spam filtering and Wikipedia for web search and entity ranking

NARCIS (Netherlands)

Kamps, J.; Kaptein, R.; Koolen, M.; Voorhees, E.M.; Buckland, L.P.

2010-01-01

In this paper, we document our efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track we wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as
Section 3. The SPARROW Surface Water-Quality Model: Theory, Application and User Documentation

Science.gov (United States)

Schwarz, G.E.; Hoos, A.B.; Alexander, R.B.; Smith, R.A.

2006-01-01

SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling technique for relating water-quality measurements made at a network of monitoring stations to attributes of the watersheds containing the stations. The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and diffuse sources on land to rivers and through the stream and river network. The model predicts contaminant flux, concentration, and yield in streams and has been used to evaluate alternative hypotheses about the important contaminant sources and watershed properties that control transport over large spatial scales. This report provides documentation for the SPARROW modeling technique and computer software to guide users in constructing and applying basic SPARROW models. The documentation gives details of the SPARROW software, including the input data and installation requirements, and guidance in the specification, calibration, and application of basic SPARROW models, as well as descriptions of the model output and its interpretation. The documentation is intended for both researchers and water-resource managers with interest in using the results of existing models and developing and applying new SPARROW models. The documentation of the model is presented in two parts. Part 1 provides a theoretical and practical introduction to SPARROW modeling techniques, which includes a discussion of the objectives, conceptual attributes, and model infrastructure of SPARROW. Part 1 also includes background on the commonly used model specifications and the methods for estimating and evaluating parameters, evaluating model fit, and generating water-quality predictions and measures of uncertainty. Part 2 provides a user's guide to SPARROW, which includes a discussion of the software architecture and details of the model input requirements and output files, graphs, and maps. The text documentation and computer
The Archives of Prefectures: from dematerialization to document management

Directory of Open Access Journals (Sweden)

Annantonia Martorano

2017-12-01

Full Text Available This item analyses the key role that IT management of document workflow plays in the implementation of the administrative functions of the Prefectures. After presenting the history and the organization of the Prefecture, which is the most important decentralized office of the Ministry of the Interior, the dissertation analyses the effectiveness and economy of its administrative actions, now characterized by the wide use of IT tools and aiming at the complete dematerialization of its documents, with the complete abandonment of the paper form. This process, now irreversible, uses the methods of traditional archival discipline and it is strengthened by new technological devises, for a proper processing and storage of the digital or paper documents.
Subject (of documents)

DEFF Research Database (Denmark)

Hjørland, Birger

2017-01-01

This article presents and discuss the concept “subject” or subject matter (of documents) as it has been examined in library and information science (LIS) for more than 100 years. Different theoretical positions are outlined and it is found that the most important distinction is between document......-oriented views versus request-oriented views. The document-oriented view conceive subject as something inherent in documents, whereas the request-oriented view (or the policy based view) understand subject as an attribution made to documents in order to facilitate certain uses of them. Related concepts...
Quantitative analysis of large amounts of journalistic texts using topic modelling

NARCIS (Netherlands)

Jacobi, C.; van Atteveldt, W.H.; Welbers, K.

2016-01-01

The huge collections of news content which have become available through digital technologies both enable and warrant scientific inquiry, challenging journalism scholars to analyse unprecedented amounts of texts. We propose Latent Dirichlet Allocation (LDA) topic modelling as a tool to face this
Organising Documentation in Knowledge Evolution and Communication

Directory of Open Access Journals (Sweden)

Cristina De Castro

2007-06-01

Full Text Available The knowledge of a subject evolves in time due to many factors, such as better understanding, study of additional issues within the same subject, study of related work from other themes, etc. This can be achieved by individual work, direct cooperation with other people and, in general, knowledge sharing. In this context, and in the broader context of knowledge communication, the appropriate organisation of documentation plays a fundamental role, but is often very difficult to achieve. A layered architecture is here proposed for the development of a structured repository of documentation, here called knowledge-bibliography KB. The process of knowledge acquisition, evolution and communication is firstly considered, then the distributed nature of nowadays knowledge and the ways it is shared and transferred are taken into account. On the basis of the above considerations, a possible clustering of documentation collected by many people is defined. An LDAP-based architecture for the implementation of this structure is also discussed.
SFM TECHNIQUE AND FOCUS STACKING FOR DIGITAL DOCUMENTATION OF ARCHAEOLOGICAL ARTIFACTS

Directory of Open Access Journals (Sweden)

P. Clini

2016-06-01

Full Text Available Digital documentation and high-quality 3D representation are always more requested in many disciplines and areas due to the large amount of technologies and data available for fast, detailed and quick documentation. This work aims to investigate the area of medium and small sized artefacts and presents a fast and low cost acquisition system that guarantees the creation of 3D models with an high level of detail, making the digitalization of cultural heritage a simply and fast procedure. The 3D models of the artefacts are created with the photogrammetric technique Structure From Motion that makes it possible to obtain, in addition to three-dimensional models, high-definition images for a deepened study and understanding of the artefacts. For the survey of small objects (only few centimetres it is used a macro lens and the focus stacking, a photographic technique that consists in capturing a stack of images at different focus planes for each camera pose so that is possible to obtain a final image with a higher depth of field. The acquisition with focus stacking technique has been finally validated with an acquisition with laser triangulation scanner Minolta that demonstrates the validity compatible with the allowable error in relation to the expected precision.
Reactive documentation system

Science.gov (United States)

Boehnlein, Thomas R.; Kramb, Victoria

2018-04-01

Proper formal documentation of computer acquired NDE experimental data generated during research is critical to the longevity and usefulness of the data. Without documentation describing how and why the data was acquired, NDE research teams lose capability such as their ability to generate new information from previously collected data or provide adequate information so that their work can be replicated by others seeking to validate their research. Despite the critical nature of this issue, NDE data is still being generated in research labs without appropriate documentation. By generating documentation in series with data, equal priority is given to both activities during the research process. One way to achieve this is to use a reactive documentation system (RDS). RDS prompts an operator to document the data as it is generated rather than relying on the operator to decide when and what to document. This paper discusses how such a system can be implemented in a dynamic environment made up of in-house and third party NDE data acquisition systems without creating additional burden on the operator. The reactive documentation approach presented here is agnostic enough that the principles can be applied to any operator controlled, computer based, data acquisition system.
The Texts of the Agency's Relationship Agreements with Specialized Agencies

Energy Technology Data Exchange (ETDEWEB)

NONE

1962-04-10

The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency.
The Texts of the Agency's Relationship Agreements with Specialized Agencies

International Nuclear Information System (INIS)

1962-01-01

The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency
Adaptive removal of background and white space from document images using seam categorization

Science.gov (United States)

Fillion, Claude; Fan, Zhigang; Monga, Vishal

2011-03-01

Document images are obtained regularly by rasterization of document content and as scans of printed documents. Resizing via background and white space removal is often desired for better consumption of these images, whether on displays or in print. While white space and background are easy to identify in images, existing methods such as naïve removal and content aware resizing (seam carving) each have limitations that can lead to undesirable artifacts, such as uneven spacing between lines of text or poor arrangement of content. An adaptive method based on image content is hence needed. In this paper we propose an adaptive method to intelligently remove white space and background content from document images. Document images are different from pictorial images in structure. They typically contain objects (text letters, pictures and graphics) separated by uniform background, which include both white paper space and other uniform color background. Pixels in uniform background regions are excellent candidates for deletion if resizing is required, as they introduce less change in document content and style, compared with deletion of object pixels. We propose a background deletion method that exploits both local and global context. The method aims to retain the document structural information and image quality.
Advisory Committee on human radiation experiments. Supplemental Volume 2a, Sources and documentation appendices. Final report

International Nuclear Information System (INIS)

1995-01-01

This large document provides a catalog of the location of large numbers of reports pertaining to the charge of the Presidential Advisory Committee on Human Radiation Research and is arranged as a series of appendices. Titles of the appendices are Appendix A- Records at the Washington National Records Center Reviewed in Whole or Part by DoD Personnel or Advisory Committee Staff; Appendix B- Brief Descriptions of Records Accessions in the Advisory Committee on Human Radiation Experiments (ACHRE) Research Document Collection; Appendix C- Bibliography of Secondary Sources Used by ACHRE; Appendix D- Brief Descriptions of Human Radiation Experiments Identified by ACHRE, and Indexes; Appendix E- Documents Cited in the ACHRE Final Report and other Separately Described Materials from the ACHRE Document Collection; Appendix F- Schedule of Advisory Committee Meetings and Meeting Documentation; and Appendix G- Technology Note
Advisory Committee on human radiation experiments. Supplemental Volume 2a, Sources and documentation appendices. Final report

Energy Technology Data Exchange (ETDEWEB)

NONE

1995-01-01

This large document provides a catalog of the location of large numbers of reports pertaining to the charge of the Presidential Advisory Committee on Human Radiation Research and is arranged as a series of appendices. Titles of the appendices are Appendix A- Records at the Washington National Records Center Reviewed in Whole or Part by DoD Personnel or Advisory Committee Staff; Appendix B- Brief Descriptions of Records Accessions in the Advisory Committee on Human Radiation Experiments (ACHRE) Research Document Collection; Appendix C- Bibliography of Secondary Sources Used by ACHRE; Appendix D- Brief Descriptions of Human Radiation Experiments Identified by ACHRE, and Indexes; Appendix E- Documents Cited in the ACHRE Final Report and other Separately Described Materials from the ACHRE Document Collection; Appendix F- Schedule of Advisory Committee Meetings and Meeting Documentation; and Appendix G- Technology Note.
Application of Laser Scanning for Creating Geological Documentation

Directory of Open Access Journals (Sweden)

Buczek Michał

2018-01-01

Full Text Available A geological documentation is based on the analyses obtained from boreholes, geological exposures, and geophysical methods. It consists of text and graphic documents, containing drilling sections, vertical crosssections through the deposit and various types of maps. The surveying methods (such as LIDAR can be applied in measurements of exposed rock layers, presented in appendices to the geological documentation. The laser scanning allows obtaining a complete profile of exposed surfaces in a short time and with a millimeter accuracy. The possibility of verifying the existing geological cross-section with laser scanning was tested on the example of the AGH experimental mine. The test field is built of different lithological rocks. Scans were taken from a single station, under favorable measuring conditions. The analysis of the signal intensity allowed to divide point cloud into separate geological layers. The results were compared with the geological profiles of the measured object. The same approach was applied to the data from the Vietnamese hard coal open pit mine Coc Sau. The thickness of exposed coal bed deposits and gangue layers were determined from the obtained data (point cloud in combination with the photographs. The results were compared with the geological cross-section.

Quantification of competitive value of documents

Directory of Open Access Journals (Sweden)

Pavel Šimek

2009-01-01

Full Text Available The majority of Internet users use the global network to search for different information using fulltext search engines such as Google, Yahoo!, or Seznam. The web presentation operators are trying, with the help of different optimization techniques, to get to the top places in the results of fulltext search engines. Right there is a great importance of Search Engine Optimization and Search Engine Marketing, because normal users usually try links only on the first few pages of the fulltext search engines results on certain keywords and in catalogs they use primarily hierarchically higher placed links in each category. Key to success is the application of optimization methods which deal with the issue of keywords, structure and quality of content, domain names, individual sites and quantity and reliability of backward links. The process is demanding, long-lasting and without a guaranteed outcome. A website operator without advanced analytical tools do not identify the contribution of individual documents from which the entire web site consists. If the web presentation operators want to have an overview of their documents and web site in global, it is appropriate to quantify these positions in a specific way, depending on specific key words. For this purpose serves the quantification of competitive value of documents, which consequently sets global competitive value of a web site. Quantification of competitive values is performed on a specific full-text search engine. For each full-text search engine can be and often are, different results. According to published reports of ClickZ agency or Market Share is according to the number of searches by English-speaking users most widely used Google search engine, which has a market share of more than 80%. The whole procedure of quantification of competitive values is common, however, the initial step which is the analysis of keywords depends on a choice of the fulltext search engine.
A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories

Science.gov (United States)

Nguyen, Hung D.; Steele, Gynelle C.

2016-01-01

This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.
Text Categorization on Hadith Sahih Al-Bukhari using Random Forest

Science.gov (United States)

Fauzan Afianto, Muhammad; Adiwijaya; Al-Faraby, Said

2018-03-01

Al-Hadith is a collection of words, deeds, provisions, and approvals of Rasulullah Shallallahu Alaihi wa Salam that becomes the second fundamental laws of Islam after Al-Qur’an. As a fundamental of Islam, Muslims must learn, memorize, and practice Al-Qur’an and Al-Hadith. One of venerable Imam which was also the narrator of Al-Hadith is Imam Bukhari. He spent over 16 years to compile about 2602 Hadith (without repetition) and over 7000 Hadith with repetition. Automatic text categorization is a task of developing software tools that able to classify text of hypertext document under pre-defined categories or subject code[1]. The algorithm that would be used is Random Forest, which is a development from Decision Tree. In this final project research, the author decided to make a system that able to categorize text document that contains Hadith that narrated by Imam Bukhari under several categories such as suggestion, prohibition, and information. As for the evaluation method, K-fold cross validation with F1-Score will be used and the result is 90%.
Primary Hepatosplenic Large B-Cell Lymphoma

Directory of Open Access Journals (Sweden)

M.R. Morales-Polanco

2008-03-01

Full Text Available Diffuse large B-cell lymphoma is the most common form of lymphoma. It usually begins in the lymph nodes; up to 40% may have an extranodal presentation. According to a definition of primary extranodal lymphoma with presentation only in extranodal sites, there are reports of large B-cell lymphomas limited to liver or spleen as separate entities, and to date there have been only three documented cases of primary hepatosplenic presentation. This paper reports a fourth case. Due to a review of the literature and the clinical course of the case reported, we conclude that primary hepatosplenic large B-cell lymphoma has been found predominantly in females older than 60 years. The patients reported had <2 months of evolution prior to diagnosis, prominent B symptoms, splenomegaly in three and hepatomegaly in two, none with lymph node involvement. All had thrombocytopenia and abnormal liver function tests; three had anemia and elevated serum lactic dehydrogenase levels, two with hemophagocytosis in bone marrow. Because of the previously mentioned data, it can be stated that primary hepatosplenic lymphoma is an uncommon and aggressive form of disease that requires immediate recognition and treatment.
Starlink Document Styles

Science.gov (United States)

Lawden, M. D.

This document describes the various styles which are recommended for Starlink documents. It also explains how to use the templates which are provided by Starlink to help authors create documents in a standard style. This paper is concerned mainly with conveying the ``look and feel" of the various styles of Starlink document rather than describing the technical details of how to produce them. Other Starlink papers give recommendations for the detailed aspects of document production, design, layout, and typography. The only style that is likely to be used by most Starlink authors is the Standard style.
Comparative Visual Analysis of Large Customer Feedback Based on Self-Organizing Sentiment Maps

OpenAIRE

Janetzko, Halldór; Jäckle, Dominik; Schreck, Tobias

2013-01-01

Textual customer feedback data, e.g., received by surveys or incoming customer email notifications, can be a rich source of information with many applications in Customer Relationship Management (CRM). Nevertheless, to date this valuable source of information is often neglected in practice, as service managers would have to read manually through potentially large amounts of feedback text documents to extract actionable information. As in many cases, a purely manual approach is not feasible, w...
Subject Retrieval from Full-Text Databases in the Humanities

Science.gov (United States)

East, John W.

2007-01-01

This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…
Upon e-Documents: Evolution or Involution on GED Market

Directory of Open Access Journals (Sweden)

Mircea GEORGESCU

2006-01-01

Full Text Available In many organizations, vital information is trapped within individual desktops and fragmented in server silos across the enterprise. Manual, ad-hoc processes create inefficiencies, confusion and delays as employees waste time searching for important information. Organizations must find safer and easier ways to access, manage and share their content. Document and file management solutions designed to be used across the organization can help achieve these goals and reduce the total cost of managing content throughout the organization. If you decide to use an EDMS, your selection requires a careful, considered balance between your legal requirements and your technological options. The decision to use an EDMS requires significant planning and analysis. Managing documents more effectively, controlling costs associated with documents and document processes, and using resources more efficiently has become and will continue to be increasingly important to businesses and IT organizations.
Document understanding for a broad class of documents

NARCIS (Netherlands)

Aiello, Marco; Monz, Christof; Todoran, Leon; Worring, Marcel

2002-01-01

We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these
Is there still an unknown Freud? A note on the publications of Freud's texts and on unpublished documents.

Science.gov (United States)

Falzeder, Ernst

2007-01-01

This article presents an overview of the existing editions of what Freud wrote (works, letters, manuscripts and drafts, diaries and calendar notes, dedications and margin notes in books, case notes, and patient calendars) and what he is recorded as having said (minutes of meetings, interviews, memoirs of and interviews with patients, family members, and followers, and other quotes). There follows a short overview of biographies of Freud and other documentation on his life. It is concluded that a wealth of material is now available to Freud scholars, although more often than not this information is used in a biased and partisan way.
Perspectivas de desarrollo para el documentalismo, el documental en soporte digital

Directory of Open Access Journals (Sweden)

Lic. Manuela Penafria

1999-01-01

Full Text Available El documental tiene una historia reciente. Al contrario de lo que generalmente se afirma, entendemos que el documental no nace al mismo tiempo que el cine. Las primeras experiencias con las imágenes en movimiento tenían por objeto tan sólo registrar acontecimientos de la vida cotidiana de las personas y de los animales. Así, la contribución de los pioneros del cine para el documental fue mostrar que el material base de trabajo para el documental son las imágenes recogidas en los lugares donde ocurren los acontecimientos. O dicho de otra forma, es el registro in loco que encontramos en el inicio del cine que constituye la raíz (principio base en que se asienta la producción documental.
Documents hipertextuals per a entorns virtuals d'aprenentatge

Directory of Open Access Journals (Sweden)

Cristòfol Rovira

1999-11-01

Full Text Available L'article mostra les noves oportunitats que la Web d'Internet ha generat en el camp de la creació de documents hipertextuals. A partir de la grandària dels nodes, s'analitzen les característiques essencials del hipertextos d'abans de l'aparició de la Web per comparar-les amb les pàgines d'Internet. També es comenten les avantatges educatives que poden tenir aquest tipus de documents per entorns virtuals d'aprenentatge i finalment es presenta una proposta per escriure hipertextos basada en la grandària dels nodes.
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.

Science.gov (United States)

Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W

2015-06-01

Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
A Survey in Indexing and Searching XML Documents.

Science.gov (United States)

Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

2002-01-01

Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…
Performance evaluation methodology for historical document image binarization.

Science.gov (United States)

Ntirogiannis, Konstantinos; Gatos, Basilis; Pratikakis, Ioannis

2013-02-01

Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.
Criteria Document for B-plant's Surveillance and Maintenance Phase Safety Basis Document

International Nuclear Information System (INIS)

SCHWEHR, B.A.

1999-01-01

This document is required by the Project Hanford Managing Contractor (PHMC) procedure, HNF-PRO-705, Safety Basis Planning, Documentation, Review, and Approval. This document specifies the criteria that shall be in the B Plant surveillance and maintenance phase safety basis in order to obtain approval of the DOE-RL. This CD describes the criteria to be addressed in the S and M Phase safety basis for the deactivated Waste Fractionization Facility (B Plant) on the Hanford Site in Washington state. This criteria document describes: the document type and format that will be used for the S and M Phase safety basis, the requirements documents that will be invoked for the document development, the deactivated condition of the B Plant facility, and the scope of issues to be addressed in the S and M Phase safety basis document
Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

Science.gov (United States)

Finch, Dezon Kile

2012-01-01

Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…
Quality control of the documentation process in electronic economic activities

Directory of Open Access Journals (Sweden)

Krutova A.S.

2017-06-01

Full Text Available It is proved that the main tool that will provide adequate information resources e economic activities of social and economic relations are documenting quality control processes as the basis of global information space. Directions problems as formation evaluation information resources in the process of documentation, namely development tools assess the efficiency of the system components – qualitative assessment; development of mathematical modeling tools – quantitative evaluation. A qualitative assessment of electronic documentation of economic activity through exercise performance, efficiency of communication; document management efficiency; effectiveness of flow control operations; relationship management effectiveness. The concept of quality control process documents electronically economic activity to components which include: the level of workflow; forms adequacy of information; consumer quality documents; quality attributes; type of income data; condition monitoring systems; organizational level process documentation; attributes of quality, performance quality consumer; type of management system; type of income data; condition monitoring systems. Grounded components of the control system electronic document subjects of economic activity. Detected components IT-audit management system economic activity: compliance audit; audit of internal control; detailed multilevel analysis; corporate risk assessment methodology. The stages and methods of processing electronic transactions economic activity during condition monitoring of electronic economic activity.
Documenting success of energy management cost reduction initiatives

International Nuclear Information System (INIS)

Stewart, A.

1993-01-01

The scope of this paper is to offer methods to document energy saving projects. The examples used are based on actual industrial facilities. I will define concepts to be used in the analysis of the industrial work place energy consumption. With the concepts defined we can begin to apply the documentation strategy for some specific examples. Why should we be interested in auditing the results of energy projects? Nearly every industrial facility has embarked on the road to energy efficiency. As one of my plant engineer associates relates open-quotes If all our energy saving programs were working as stated the power company would be paying us.close quotes The underlying principles in this statement are true. Does it mean we as technicians, engineers and managers of energy projects have failed? No, we have however failed to finish the job and document there results. My experience has shown there is good support and enthusiasm for those energy projects we begin. It is also my experience that a well documented successful project provides many levels of satisfaction. Large energy management projects involve a major financial commitment. Documenting the results provides all those who supported the project from finance, management and the technical staff the positive reinforcement to support your future projects. We should begin by defining what an energy audit is and what is the expected result of an audit
Biomarker Identification Using Text Mining

Directory of Open Access Journals (Sweden)

Hui Li

2012-01-01

Full Text Available Identifying molecular biomarkers has become one of the important tasks for scientists to assess the different phenotypic states of cells or organisms correlated to the genotypes of diseases from large-scale biological data. In this paper, we proposed a text-mining-based method to discover biomarkers from PubMed. First, we construct a database based on a dictionary, and then we used a finite state machine to identify the biomarkers. Our method of text mining provides a highly reliable approach to discover the biomarkers in the PubMed database.

DOCUMENTING LIVING MONUMENTS IN INDONESIA: METHODOLOGY FOR SUSTAINABLE UTILITY

Directory of Open Access Journals (Sweden)

F. Suryaningsih

2013-07-01

Full Text Available The systematic documentation of cultural heritage in Indonesia has been developed after the establishment of Bataviaasch Genootschap van Kunsten en Wetenschappen (1778 and De Oudheidkundige Dienst (1913 by the Netherlands Indies government. After Indonesian independent, the tasks of cultural heritage documentation take over by The Ministry of Culture (now become The Ministry of Education of Culture with focus on the ancient and classical heritage, so called dead monument. The needed of comprehensive documentation data regarding cultural heritage become significant issues since the government and private sector pay attention to the preservation of heritage building in the urban site, so called living monument. The archives of original drawing plan many times do not fit with the existing condition, while the conservation plan demands a document such as built drawing plan to work on. The technology, methodology and system to provide such comprehensive document of heritage building and site become important, to produce good conservation plan and heritage building regular maintenance. It means the products will have a sustainable and various utility values. Since 1994, Documentation Centre for Architecture – Indonesia (PDA, has established to meet the needs of a comprehensive data of heritage building (living monuments, to utilized as basic document for conservation planning. Not only provide document of the digital drawing such site plan, plan, elevation, section and details of architecture elements, but also document of historic research, material analysis and completed with diagnosis and mapping of building damages. This manuscript is about PDA field experience, working in this subject issue
Machine Learning Algorithms for Statistical Patterns in Large Data Sets

Science.gov (United States)

2018-02-01

SUBJECT TERMS Text Analysis, Text Exploitation, Situation Awareness of Text , Document Processing, Document Ingestion, Full Text Search, Information...Assortativity: Proclivity Index for Attributed Networks (PRONE).” Pacific-Asia Conference on Knowledge Discovery and Data Mining , 2017. pp. 225-237...international conference on Knowledge discovery and data mining , 2013. pp. 212-220. [18] Sutherland, D.J., Xiong, L., Póczos, B., and Schneider, J
PROJECT ENGINEERING DATA MANAGEMENT AT AUTOMATED PREPARATION OF DESIGN DOCUMENTATION

Directory of Open Access Journals (Sweden)

A. V. Guryanov

2017-01-01

Full Text Available We have developed and realized instrumental means for automated support of end-to-end design process for design documentation on a product at the programming level. The proposed decision is based on processing of the engineering project data that are contained in interdependent design documents: tactical technical characteristics of products, data on the valuable metals contained in them, the list of components applied in a product and others. Processing of engineering data is based on their conversion to the form provided by requirements of industry standards for design documentation preparation. The general graph of the design documentation developed on a product is provided. The description of the developed software product is given. Automated preparation process of interdependent design documents is shown on the example of preparation of purchased products list. Results of work can be used in case of research and development activities on creation of perspective samples of ADP equipment.
Improving nurse documentation and record keeping in stoma care.

Science.gov (United States)

Law, Lesley; Akroyd, Karen; Burke, Linda

Evidence suggests that nurse documentation is often inconsistent and lacks a coherent and standardized approach. This article reports on research into the use of nurse documentation on a stoma care ward in a large London hospital, and explores the factors that may affect the process of record keeping by nursing staff. This study uses stoma care as a case study to explore the role of documentation on the ward, focusing on how this can be improved. It is based on quantitative and qualitative methods. The medical notes of 56 patients were analysed and in addition, focus groups with a number of nurses were undertaken. Quantitative findings indicate that although 80% of patients had a chart filed in their medical notes, only a small portion of the form was completed by nursing staff. Focus group findings indicate that this is because forms lacked standardization and because the language used was often ambiguous. Staff also felt that such documentation was not viewed by other nurses and so, was not effective in improving patient care. As a result of this study, significant improvements have been made to documentation used on the stoma care ward. This is an important exploration of record keeping within nursing in the context of the Nursing and Midwifery Council's emphasis on the importance of documentation in achieving effective patient outcomes.
INFORMATION DOCUMENTS – PRIMORDIAL INSTRUMENTS IN TOURIST COMMUNICATION

Directory of Open Access Journals (Sweden)

Denisa PARPANDEL

2010-01-01

Full Text Available Tourist information proved to have an important influence on the choice of holiday destinations. An important category of promotional means used tourism as a source of information is a tourist information documents in which graphical advertising has a great importance. In a harmonious combination between an informative text and a picture suggestive of its different forms (flyers, brochures, catalogs, guides and tourist maps, posters and billboards, advertisements in the press visualize products of interest. This article highlights the importance of tourism information documents on the selection of destination, the requirements and recommendations for their design and the need arrangement advertisement to increase its impact on potential tourists. Tour operators in cooperation with advertising agency, choosing one means of communication and advertising medium itself, according to market research conducted, the production capacity or area of interest to prepare an advertising campaign, the level of tariffs and the type of benefits offered, the type of tourism product offered and the target market segment targeted.
Factors that affect the accuracy of text-based language identification

CSIR Research Space (South Africa)

Botha, GR

2007-11-01

Full Text Available its excellent accuracy, another significant ad- vantage of the NB classifier is that new language doc- uments can simply be merged into an existing classifier by adding the n-gram statistics of these documents to the current language model...
Generic safety documentation model

International Nuclear Information System (INIS)

Mahn, J.A.

1994-04-01

This document is intended to be a resource for preparers of safety documentation for Sandia National Laboratories, New Mexico facilities. It provides standardized discussions of some topics that are generic to most, if not all, Sandia/NM facilities safety documents. The material provides a ''core'' upon which to develop facility-specific safety documentation. The use of the information in this document will reduce the cost of safety document preparation and improve consistency of information
Documenting Instructional Practices in Large Introductory STEM Lecture Courses

Science.gov (United States)

Vu, Viet Quoc

STEM education reform in higher education is framed around the need to improve student learning outcomes, increase student retention, and increase the number of underrepresented minorities and female students in STEM fields, all of which would ultimately contribute to America's competitiveness and prosperity. To achieve these goals, education reformers call for an increase in the adoption of research-based "promising practices" in classrooms. Despite efforts to increase the adoption of more promising practices in classrooms, postsecondary instructors are still likely to lecture and use traditional teaching approaches. To shed light on this adoption dilemma, a mix-methods study was conducted. First, instructional practices in large introductory STEM courses were identified, followed by an analysis of factors that inhibit or contribute to the use of promising practices. Data were obtained from classroom observations (N = 259) of large gateway courses across STEM departments and from instructor interviews (N = 67). Results show that instructors are already aware of promising practices and that change strategies could move from focusing on the development and dissemination of promising practices to focusing on improving adoption rates. Teaching-track instructors such as lecturers with potential for security of employment (LPSOE) and lecturers with security of employment (LSOE) have adopted promising practices more than other instructors. Interview data show that LPSOEs are also effective at disseminating promising practices to their peers, but opinion leaders (influential faculty in a department) are necessary to promote adoption of promising practices by higher ranking instructors. However, hiring more LPSOEs or opinion leaders will not be enough to shift instructional practices. Variations in the adoption of promising practices by instructors and across departments show that any reform strategy needs to be systematic and take into consideration how information is
The number of scholarly documents on the public web.

Directory of Open Access Journals (Sweden)

Madian Khabsa

Full Text Available The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24% are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
WIPP documentation plan

International Nuclear Information System (INIS)

Plung, D.L.; Montgomery, T.T.; Glasstetter, S.R.

1986-01-01

In support of the programs at the Waste Isolation Pilot Plant (WIPP), the Publications and Procedures Section developed a documentation plan that provides an integrated document hierarchy; further, this plan affords several unique features: 1) the format for procedures minimizes the writing responsibilities of the technical staff and maximizes use of the writing and editing staff; 2) review cycles have been structured to expedite the processing of documents; and 3) the numbers of documents needed to support the program have been appreciably reduced
2002 reference document; Document de reference 2002

Energy Technology Data Exchange (ETDEWEB)

NONE

2002-07-01

This 2002 reference document of the group Areva, provides information on the society. Organized in seven chapters, it presents the persons responsible for the reference document and for auditing the financial statements, information pertaining to the transaction, general information on the company and share capital, information on company operation, changes and future prospects, assets, financial position, financial performance, information on company management and executive board and supervisory board, recent developments and future prospects. (A.L.B.)
Electronic Document Management Systems: Where Are They Today?

Science.gov (United States)

Koulopoulos, Thomas M.; Frappaolo, Carl

1993-01-01

Discusses developments in document management systems based on a survey of over 400 corporations and government agencies. Text retrieval and imaging markets, architecture and integration, purchasing plans, and vendor market leaders are covered. Five graphs present data on user preferences for improvements. A sidebar article reviews the development…
SWIFT-Review: a text-mining workbench for systematic review.

Science.gov (United States)

Howard, Brian E; Phillips, Jason; Miller, Kyle; Tandon, Arpit; Mav, Deepak; Shah, Mihir R; Holmgren, Stephanie; Pelch, Katherine E; Walker, Vickie; Rooney, Andrew A; Macleod, Malcolm; Shah, Ruchir R; Thayer, Kristina

2016-05-23

There is growing interest in using machine learning approaches to priority rank studies and reduce human burden in screening literature when conducting systematic reviews. In addition, identifying addressable questions during the problem formulation phase of systematic review can be challenging, especially for topics having a large literature base. Here, we assess the performance of the SWIFT-Review priority ranking algorithm for identifying studies relevant to a given research question. We also explore the use of SWIFT-Review during problem formulation to identify, categorize, and visualize research areas that are data rich/data poor within a large literature corpus. Twenty case studies, including 15 public data sets, representing a range of complexity and size, were used to assess the priority ranking performance of SWIFT-Review. For each study, seed sets of manually annotated included and excluded titles and abstracts were used for machine training. The remaining references were then ranked for relevance using an algorithm that considers term frequency and latent Dirichlet allocation (LDA) topic modeling. This ranking was evaluated with respect to (1) the number of studies screened in order to identify 95 % of known relevant studies and (2) the "Work Saved over Sampling" (WSS) performance metric. To assess SWIFT-Review for use in problem formulation, PubMed literature search results for 171 chemicals implicated as EDCs were uploaded into SWIFT-Review (264,588 studies) and categorized based on evidence stream and health outcome. Patterns of search results were surveyed and visualized using a variety of interactive graphics. Compared with the reported performance of other tools using the same datasets, the SWIFT-Review ranking procedure obtained the highest scores on 11 out of 15 of the public datasets. Overall, these results suggest that using machine learning to triage documents for screening has the potential to save, on average, more than 50 % of the screening
A Similarity-Based Approach for Audiovisual Document Classification Using Temporal Relation Analysis

Directory of Open Access Journals (Sweden)

Ferrane Isabelle

2011-01-01

Full Text Available Abstract We propose a novel approach for video classification that bases on the analysis of the temporal relationships between the basic events in audiovisual documents. Starting from basic segmentation results, we define a new representation method that is called Temporal Relation Matrix (TRM. Each document is then described by a set of TRMs, the analysis of which makes events of a higher level stand out. This representation has been first designed to analyze any audiovisual document in order to find events that may well characterize its content and its structure. The aim of this work is to use this representation to compute a similarity measure between two documents. Approaches for audiovisual documents classification are presented and discussed. Experimentations are done on a set of 242 video documents and the results show the efficiency of our proposals.
Use of Solr and Xapian in the Invenio document repository software

CERN Document Server

Glauner, Patrick; Le Meur, Jean-Yves; Simko, Tibor

2013-01-01

Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr’s efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio- based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio.
Cultural text mining: using text mining to map the emergence of transnational reference cultures in public media repositories

NARCIS (Netherlands)

Pieters, Toine; Verheul, Jaap

2014-01-01

This paper discusses the research project Translantis, which uses innovative technologies for cultural text mining to analyze large repositories of digitized public media, such as newspapers and journals.1 The Translantis research team uses and develops the text mining tool Texcavator, which is
Documenting Employee Conduct

Science.gov (United States)

Dalton, Jason

2009-01-01

One of the best ways for a child care program to lose an employment-related lawsuit is failure to document the performance of its employees. Documentation of an employee's performance can provide evidence of an employment-related decision such as discipline, promotion, or discharge. When properly implemented, documentation of employee performance…
Non-Local Sparse Image Inpainting for Document Bleed-Through Removal

Directory of Open Access Journals (Sweden)

Muhammad Hanif

2018-05-01

Full Text Available Bleed-through is a frequent, pervasive degradation in ancient manuscripts, which is caused by ink seeped from the opposite side of the sheet. Bleed-through, appearing as an extra interfering text, hinders document readability and makes it difficult to decipher the information contents. Digital image restoration techniques have been successfully employed to remove or significantly reduce this distortion. This paper proposes a two-step restoration method for documents affected by bleed-through, exploiting information from the recto and verso images. First, the bleed-through pixels are identified, based on a non-stationary, linear model of the two texts overlapped in the recto-verso pair. In the second step, a dictionary learning-based sparse image inpainting technique, with non-local patch grouping, is used to reconstruct the bleed-through-contaminated image information. An overcomplete sparse dictionary is learned from the bleed-through-free image patches, which is then used to estimate a befitting fill-in for the identified bleed-through pixels. The non-local patch similarity is employed in the sparse reconstruction of each patch, to enforce the local similarity. Thanks to the intrinsic image sparsity and non-local patch similarity, the natural texture of the background is well reproduced in the bleed-through areas, and even a possible overestimation of the bleed through pixels is effectively corrected, so that the original appearance of the document is preserved. We evaluate the performance of the proposed method on the images of a popular database of ancient documents, and the results validate the performance of the proposed method compared to the state of the art.
Health physics documentation

International Nuclear Information System (INIS)

Stablein, G.

1980-01-01

When dealing with radioactive material the health physicist receives innumerable papers and documents within the fields of researching, prosecuting, organizing and justifying radiation protection. Some of these papers are requested by the health physicist and some are required by law. The scope, quantity and deposit periods of the health physics documentation at the Karlsruhe Nuclear Research Center are presented and rationalizing methods discussed. The aim of this documentation should be the application of physics to accident prevention, i.e. documentation should protect those concerned and not the health physicist. (H.K.)
Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies.

Science.gov (United States)

Cohen, Raphael; Elhadad, Michael; Elhadad, Noémie

2013-01-16

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining? We analyze a large-scale EHR corpus and quantify redundancy both in terms of word and semantic concept repetition. We observe redundancy levels of about 30% and non-standard distribution of both words and concepts. We measure the impact of redundancy on two standard text-mining applications: collocation identification and topic modeling. We compare the results of these methods on synthetic data with controlled levels of redundancy and observe significant performance variation. Finally, we compare two mitigation strategies to avoid redundancy-induced bias: (i) a baseline strategy, keeping only the last note for each patient in the corpus; (ii) removing redundant notes with an efficient fingerprinting-based algorithm. (a)For text mining, preprocessing the EHR corpus with fingerprinting yields

RELIABLE COGNITIVE DIMENSIONAL DOCUMENT RANKING BY WEIGHTED STANDARD CAUCHY DISTRIBUTION

Directory of Open Access Journals (Sweden)

S Florence Vijila

2017-04-01

Full Text Available Categorization of cognitively uniform and consistent documents such as University question papers are in demand by e-learners. Literature indicates that Standard Cauchy distribution and the derived values are extensively used for checking uniformity and consistency of documents. The paper attempts to apply this technique for categorizing question papers according to four selective cognitive dimensions. For this purpose cognitive dimensional keyword sets of these four categories (also termed as portrayal concepts are assumed and an automatic procedure is developed to quantify these dimensions in question papers. The categorization is relatively accurate when checked with manual methods. Hence simple and well established term frequency / inverse document frequency ‘tf/ IDF’ technique is considered for automating the categorization process. After the documents categorization, standard Cauchy formula is applied to rank order the documents that have the least differences among Cauchy value, (according to Cauchy theorem so as obtain consistent and uniform documents in an order or ranked. For the purpose of experiments and social survey, seven question papers (documents have been designed with various consistencies. To validate this proposed technique social survey is administered on selective samples of e-learners of Tamil Nadu, India. Results are encouraging and conclusions drawn out of the experiments will be useful to researchers of concept mining and categorizing documents according to concepts. Findings have also contributed utility value to e-learning system designers.
Device of Definition of Hand-Written Documents Belonging to One Executor

Directory of Open Access Journals (Sweden)

S. D. Kulik

2012-03-01

Full Text Available Results of working out of the device of definition of hand-written documents belonging to the executor of the text in Russian are presented. The device is intended for automation of work of experts and allows to solve problems of information security and search of criminals.
Pesquisa documental: pistas teóricas e metodológicas

Directory of Open Access Journals (Sweden)

Jackson Ronie Sá-Silva

2015-05-01

Full Text Available O objetivo deste artigo é apresentar alguns apontamentos teóricos e metodológicos sobre a pesquisa documental. Ao fazermos essa exposição pública, por meio de ensaio bibliográfico, queremos provocar o debate sobre a utilização desse procedimento no cotidiano das pesquisas de estudantes, professores e pesquisadores. Primeiramente, conceituamos a pesquisa documental, apresentando as similaridades e diferenças entre esta e a pesquisa bibliográfica, para, em seguida, discutirmos o conceito de documento. Na seqüência, abordamos os critérios metodológicos de pré-análise do documento escrito e, por fim, apresentamos as etapas da análise documental.
Vocabulary Constraint on Texts

Directory of Open Access Journals (Sweden)

C. Sutarsyah

2008-01-01

Full Text Available This case study was carried out in the English Education Department of State University of Malang. The aim of the study was to identify and describe the vocabulary in the reading text and to seek if the text is useful for reading skill development. A descriptive qualitative design was applied to obtain the data. For this purpose, some available computer programs were used to find the description of vocabulary in the texts. It was found that the 20 texts containing 7,945 words are dominated by low frequency words which account for 16.97% of the words in the texts. The high frequency words occurring in the texts were dominated by function words. In the case of word levels, it was found that the texts have very limited number of words from GSL (General Service List of English Words (West, 1953. The proportion of the first 1,000 words of GSL only accounts for 44.6%. The data also show that the texts contain too large proportion of words which are not in the three levels (the first 2,000 and UWL. These words account for 26.44% of the running words in the texts.Â It is believed that the constraints are due to the selection of the texts which are made of a series of short-unrelated texts. This kind of text is subject to the accumulation of low frequency words especially those of content words and limited of words from GSL. It could also defeat the development of students' reading skills and vocabulary enrichment.
Improving imbalanced scientific text classification using sampling strategies and dictionaries

Directory of Open Access Journals (Sweden)

Borrajo L.

2011-12-01

Full Text Available Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation.
The Texts of the Agency's Relationship Agreements with Specialized Agencies

International Nuclear Information System (INIS)

1960-01-01

The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency
Ahmad's NPRT system: A practical innovation for documenting male pattern baldness

Directory of Open Access Journals (Sweden)

Muhammad Ahmad

2016-01-01

Full Text Available Various classifications for male pattern baldness are mentioned in the literature. The 'Norwood's classification is the most commonly used but it has certain limitations. The new system has included 'three' extra features which were not mentioned in any other classification. It provides an opportunity to document the full and correct picture while documenting male pattern baldness. It also aids in assessing the treatment for various degrees of baldness.
Incorporating other texts: Intertextuality in Malaysian CSR reports

Directory of Open Access Journals (Sweden)

Kumaran Rajandran

2016-11-01

Full Text Available In Malaysia, corporate social responsibility (CSR is relatively new but corporations have been required to engage in and disclose their CSR. A typical genre for disclosure is CSR reports and these reports often refer to other texts. The article investigates the act of referencing to other texts or intertextuality in Malaysian CSR reports. It creates an archive of CEO Statements and Environment Sections in CSR reports and studies the archive for keywords, which can identify the incorporated texts. The function of these texts is examined in relation to Malaysia’s corporate context. CSR reports contain explicit references to documents (policies, regulations, reports, research, standards and to individuals/groups (CEOs, stakeholders, expert organizations. The incorporated texts display variation in corporate control, which organizes these texts along an intertextual cline. The cline helps to identify corporate and non-corporate sources among the texts. The selection of incorporated texts may reflect government and stock exchange demands. The texts are not standardized and are relevant for the CSR domain and corporations, where these texts monitor and justify CSR performance. Yet, the incorporated texts may perpetuate inexact reporting because corporations select the texts and the parts of texts to refer to. Since these texts have been employed to scrutinize initiatives and results, CSR reports can claim to represent the “truth” about a corporation’s CSR. Hence, intertextuality serves corporate interests.
Methods for Mining and Summarizing Text Conversations

CERN Document Server

Carenini, Giuseppe; Murray, Gabriel

2011-01-01

Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods
A text messaging intervention to improve heart failure self-management after hospital discharge in a largely African-American population: before-after study.

Science.gov (United States)

Nundy, Shantanu; Razi, Rabia R; Dick, Jonathan J; Smith, Bryan; Mayo, Ainoa; O'Connor, Anne; Meltzer, David O

2013-03-11

There is increasing interest in finding novel approaches to reduce health disparities in readmissions for acute decompensated heart failure (ADHF). Text messaging is a promising platform for improving chronic disease self-management in low-income populations, yet is largely unexplored in ADHF. The purpose of this pre-post study was to assess the feasibility and acceptability of a text message-based (SMS: short message service) intervention in a largely African American population with ADHF and explore its effects on self-management. Hospitalized patients with ADHF were enrolled in an automated text message-based heart failure program for 30 days following discharge. Messages provided self-care reminders and patient education on diet, symptom recognition, and health care navigation. Demographic and cell phone usage data were collected on enrollment, and an exit survey was administered on completion. The Self-Care of Heart Failure Index (SCHFI) was administered preintervention and postintervention and compared using sample t tests (composite) and Wilcoxon rank sum tests (individual). Clinical data were collected through chart abstraction. Of 51 patients approached for recruitment, 27 agreed to participate and 15 were enrolled (14 African-American, 1 White). Barriers to enrollment included not owning a personal cell phone (n=12), failing the Mini-Mental exam (n=3), needing a proxy (n=2), hard of hearing (n=1), and refusal (n=3). Another 3 participants left the study for health reasons and 3 others had technology issues. A total of 6 patients (5 African-American, 1 White) completed the postintervention surveys. The mean age was 50 years (range 23-69) and over half had Medicaid or were uninsured (60%, 9/15). The mean ejection fraction for those with systolic dysfunction was 22%, and at least two-thirds had a prior hospitalization in the past year. Participants strongly agreed that the program was easy to use (83%), reduced pills missed (66%), and decreased salt intake
Collaborative filtering to improve navigation of large radiology knowledge resources.

Science.gov (United States)

Kahn, Charles E

2005-06-01

Collaborative filtering is a knowledge-discovery technique that can help guide readers to items of potential interest based on the experience of prior users. This study sought to determine the impact of collaborative filtering on navigation of a large, Web-based radiology knowledge resource. Collaborative filtering was applied to a collection of 1,168 radiology hypertext documents available via the Internet. An item-based collaborative filtering algorithm identified each document's six most closely related documents based on 248,304 page views in an 18-day period. Documents were amended to include links to their related documents, and use was analyzed over the next 5 days. The mean number of documents viewed per visit increased from 1.57 to 1.74 (P Collaborative filtering can increase a radiology information resource's utilization and can improve its usefulness and ease of navigation. The technique holds promise for improving navigation of large Internet-based radiology knowledge resources.
Methodological Demonstration of a Text Analytics Approach to Country Logistics System Assessments

DEFF Research Database (Denmark)

Kinra, Aseem; Mukkamala, Raghava Rao; Vatrapu, Ravi

2017-01-01

The purpose of this study is to develop and demonstrate a semi-automated text analytics approach for the identification and categorization of information that can be used for country logistics assessments. In this paper, we develop the methodology on a set of documents for 21 countries using...... and the text analyst. Implications are discussed and future work is outlined....
A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

Science.gov (United States)

Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia

2013-01-01

Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008
Beyond a Box of Documents: The Collaborative Partnership Behind the Oregon Chinese Disinterment Documents Collection

Directory of Open Access Journals (Sweden)

Natalia M. Fernández

2013-06-01

Full Text Available This article is a case study of a collaboration between the Oregon Multicultural Archives of Oregon State University, Portland State University Library's Special Collections, the Chinese Consolidated Benevolent Association (CCBA, and the Northwest News Network to preserve and make accessible a recovered box of Oregon Chinese disinterment documents. By examining what influenced and engaged each partner, this case study offers an opportunity to better understand the motivations of diverse stakeholders in a "post-custodial era" project that challenges traditional practices of custody, control, and access.
Standardization Documents

Science.gov (United States)

2011-08-01

Specifications and Standards; Guide Specifications; CIDs; and NGSs . Learn. Perform. Succeed. STANDARDIZATION DOCUMENTS Federal Specifications Commercial...national or international standardization document developed by a private sector association, organization, or technical society that plans ...Maintain lessons learned • Examples: Guidance for application of a technology; Lists of options Learn. Perform. Succeed. DEFENSE HANDBOOK
Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries.

Science.gov (United States)

Leroy, Gondy; Endicott, James E

2011-10-01

With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, term familiarity , which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.
Utopia documents: linking scholarly literature with research data.

Science.gov (United States)

Attwood, T K; Kell, D B; McDermott, P; Marsh, J; Pettifer, S R; Thorne, D

2010-09-15

In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged. To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can be linked, annotated, visualized and analyzed interactively (http://www.biochemj.org/bj/424/3/). Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various text-mining and biodatabase plugins, demonstrating the system's ability to seamlessly integrate on-line content with PDF articles. http://getutopia.com.
The Texts of the Agency's Relationship Agreements with Specialized Agencies

International Nuclear Information System (INIS)

1960-01-01

The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [es
The Texts of the Agency's Relationship Agreements with Specialized Agencies

International Nuclear Information System (INIS)

1960-01-01

The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [fr
The phenomenon of soccer in some literary texts: Classical and contemporary

Directory of Open Access Journals (Sweden)

Victor Gil Castañeda

2009-11-01

Full Text Available This article talks about how in the literature history, many authors had shown a profound interest in describing the phenomenon of football soccer, one of the most popular sports on earth. We can see this aspect in pre-Hispanic texts like: Popol Vuh, also in some modern intellectuals like Eduardo Galeano (Uruguayan, in his book: El footboy a sol y sombra. The document also mentioned other literary texts which prominent figures, narrative atmospheres, sail in the aesthetic description of the football

Large Eddy Simulation (LES for IC Engine Flows

Directory of Open Access Journals (Sweden)

Kuo Tang-Wei

2013-10-01

Full Text Available Numerical computations are carried out using an engineering-level Large Eddy Simulation (LES model that is provided by a commercial CFD code CONVERGE. The analytical framework and experimental setup consist of a single cylinder engine with Transparent Combustion Chamber (TCC under motored conditions. A rigorous working procedure for comparing and analyzing the results from simulation and high speed Particle Image Velocimetry (PIV experiments is documented in this work. The following aspects of LES are analyzed using this procedure: number of cycles required for convergence with adequate accuracy; effect of mesh size, time step, sub-grid-scale (SGS turbulence models and boundary condition treatments; application of the proper orthogonal decomposition (POD technique.
Intelligent bar chart plagiarism detection in documents.

Science.gov (United States)

Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Rehman, Amjad; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

2014-01-01

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.
Intelligent Bar Chart Plagiarism Detection in Documents

Science.gov (United States)

Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

2014-01-01

This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts. PMID:25309952
MANAGING HUMAN FACTORS IN IMPLEMENTING ELECTRONIC DOCUMENT SYSTEM IN THE PUBLIC SECTOR

Directory of Open Access Journals (Sweden)

TOMS LEIKUMS

2012-05-01

Full Text Available Document management underlies the activities of almost every organization. Correctly managed correspondence and organized document circulation characterize successful performance particularly in the public sector organizations. Even though production of documents itself is not the main task of governmental institutions, document creation and processing are crucial processes for the provision of basic functions in public sector. In the 21st century it gets more important to use the new possibilities offered by modern technologies, including electronic document management. Public sector itself is a heavy bureaucratic apparatus in the need of elasticity and ability to change its working processes and habits in order to gradually switch to the digital environment. Western European countries have already turned to electronic document management whilst most of the Eastern European countries, including Latvia, have just recently started a gradual electronization of document circulation. When implementing electronic document management systems in the public sector organizations, it often comes to resistance of the staff and unwillingness to change the accustomed methods of work – paper format document circulation. Both lower level staff and higher level managers put obstacles to electronic document management. In this article author inspects cases of successful practice and analyses possible action mechanisms that could convince public sector personnel of advantages of electronic document circulation and prepare them to switch to work with digital documents.
Multilingual text induced spelling correction

NARCIS (Netherlands)

Reynaert, M.W.C.

2004-01-01

We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams
(Co-)constructing specialized knowledge - Internet texts as a case in point

DEFF Research Database (Denmark)

Kampf, Constance

. Working from Bazerman's definition of writing as social action, and combining it withWenger's definition for communities of practice which relies on participation and reificationoccurring in conjunction with written documents, we can understand Internet texts as reificationsin an ongoing social action...
A função arquivística de avaliação documental no software livre de gestão documental Nuxeo

Directory of Open Access Journals (Sweden)

Sérgio Renato Lampert

2017-04-01

Full Text Available Apresenta o estudo do Software Livre de Gestão Documental Nuxeo frente à implementação da função arquivística de avaliação documental. A análise da ferramenta possibilitou verificar o procedimento de instalação, apontando dificuldades e barreiras para os profissionais da informação que desejam instalar a solução. Considerando os pressupostos teóricos acerca da avaliação documental, buscou-se analisar a empregabilidade desta, na ferramenta, a fim de validar a aplicação da teoria das três idades. O exame das funcionalidades do Nuxeo permitiu identificar que o software não aplica de modo automatizado a função de avaliação documental. Apesar de não ser uma solução arquivística, conclui-se que o Nuxeo pode ser utilizado para a gestão de documentos digitais, uma vez que apresenta em sua estrutura metadados para avaliação documental. A análise de softwares de gestão documental, sob o viés arquivístico, possibilita aproximar o arquivista das Tecnologias da Informação e garantir o acesso futuro às informações em meio digital.
Document Examination: Applications of Image Processing Systems.

Science.gov (United States)

Kopainsky, B

1989-12-01

Dealing with images is a familiar business for an expert in questioned documents: microscopic, photographic, infrared, and other optical techniques generate images containing the information he or she is looking for. A recent method for extracting most of this information is digital image processing, ranging from the simple contrast and contour enhancement to the advanced restoration of blurred texts. When combined with a sophisticated physical imaging system, an image pricessing system has proven to be a powerful and fast tool for routine non-destructive scanning of suspect documents. This article reviews frequent applications, comprising techniques to increase legibility, two-dimensional spectroscopy (ink discrimination, alterations, erased entries, etc.), comparison techniques (stamps, typescript letters, photo substitution), and densitometry. Computerized comparison of handwriting is not included. Copyright © 1989 Central Police University.
A two-sided academic landscape: snapshot of highly-cited documents in Google Scholar (1950-2013

Directory of Open Access Journals (Sweden)

Alberto Martín-Martín

2016-12-01

Full Text Available The main objective of this paper is to identify and define the core characteristics of the set of highly-cited documents in Google Scholar (document types, language, free availability, sources, and number of versions, on the hypothesis that the wide coverage of this search engine may provide a different portrait of these documents with respect to that offered by traditional bibliographic databases. To do this, a query per year was carried out from 1950 to 2013 identifying the top 1,000 documents retrieved from Google Scholar and obtaining a final sample of 64,000 documents, of which 40% provided a free link to full-text. The results obtained show that the average highly-cited document is a journal or book article (62% of the top 1% most cited documents of the sample, written in English (92.5% of all documents and available online in PDF format (86.0% of all documents. Yet, the existence of errors should be noted, especially when detecting duplicates and linking citations properly. Nonetheless, the fact that the study focused on highly cited papers minimizes the effects of these limitations. Given the high presence of books and, to a lesser extent, of other document types (such as proceedings or reports, the present research concludes that the Google Scholar data offer an original and different vision of the most influential academic documents (measured from the perspective of their citation count, a set composed not only of strictly scientific material (journal articles but also of academic material in its broadest sense.
Information Technology Act 2000 in India - Authentication of E-Documents

Directory of Open Access Journals (Sweden)

R. G. Pawar

2007-06-01

Full Text Available The Information Technology Act 2000 has enacted in India on 9th June 2000. This Act has mentioned provision of authentication of electronic document. It is the need of hour at that time that such provision is needed in the Indian Law system, especially for electronic commerce and electronic governance. Electronic commerceâ€, which involve the use of alternatives to paper based methods of communication and storage information. To do electronic commerce there should be authentication of particular document. The working of internet is the documents are traveling in terms of bits from one destination to other destination, through various media like â€“ Co-axial cable, fiber optic, satellite etc. While traveling this document there is probability of making changes in that document by any third party is high or some document may get changed due to noise/disturbance in communication media. This Act required to provide legal recognition carried out by means of electronic data interchange and other means of electronic communication.In this paper researchers studied technological aspects of Information Technology Act 2000 like hash function, encryption, decryption, public key, private key etc. and its process. This paper gives details about certifying authority in detail. There should be some mechanism that will take care of document, that what ever the document is received should be the authentic one and it would not get changed in any manner due to any cause.
Synthesis document on the long time behavior of packages: operational document ''bituminous'' 2204

International Nuclear Information System (INIS)

Tiffreau, C.

2004-09-01

This document is realized in the framework of the law of 1991 on the radioactive wastes management. The 2004 synthesis document on long time behavior of bituminous sludges packages is constituted by two documents, the reference document and the operational document. This paper presents the operational model describing the water alteration of the packages and the associated radioelements release, as the gas term source and the swelling associated to the self-irradiation and the bituminous radiolysis. (A.L.B.)
The Janus Head Article - On Quality in the Documentation Process

Directory of Open Access Journals (Sweden)

Henrik Andersen

2006-03-01

Full Text Available The god Janus in Greek mythology was a two-faced god; each face had its own view of the world. Our idea behind the Janus Head article is to give you two different and maybe even contradicting views on a certain topic. In this issue the topic is quality in the documentation process. In the first half of this issue’s Janus Head Article translators from the international company Grundfos give us their view of quality and how quality is managed in the documentation process at Grundfos. In the second half of the Janus Head Article scholars from the University of Southern Denmark describe and discuss quality in the documentation process at Grundfos from a researcher’s point of view.
The Janus Head Article - On Quality in the Documentation Process

Directory of Open Access Journals (Sweden)

Henrik Andersen

2012-08-01

Full Text Available The god Janus in Greek mythology was a two-faced god; each face had its own view of the world. Our idea behind the Janus Head article is to give you two different and maybe even contradicting views on a certain topic. In this issue the topic is quality in the documentation process. In the first half of this issue’s Janus Head Article translators from the international company Grundfos give us their view of quality and how quality is managed in the documentation process at Grundfos. In the second half of the Janus Head Article scholars from the University of Southern Denmark describe and discuss quality in the documentation process at Grundfos from a researcher’s point of view.
A survey of text clustering techniques used for web mining

Directory of Open Access Journals (Sweden)

Dan MUNTEANU

2005-12-01

Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.
Quality improvement in clinical documentation: does clinical governance work?

Directory of Open Access Journals (Sweden)

Dehghan M

2013-12-01

Full Text Available Mahlegha Dehghan,1 Dorsa Dehghan,2 Akbar Sheikhrabori,3 Masoume Sadeghi,4 Mehrdad Jalalian5 1Department of Medical Surgical Nursing, School of Nursing and Midwifery, Kerman University of Medical Sciences, Kerman, 2Department of Pediatric Nursing, School of Nursing and Midwifery, Islamic Azad University Kerman Branch, Kerman, 3Department of Medical Surgical Nursing, School of Nursing and Midwifery, Kerman University of Medical Sciences, Kerman, 4Research Center for Modeling in Health, Institute of Futures Studies in Health, Kerman University of Medical Sciences, Kerman, 5Electronic Physician Journal, Mashhad, Iran Introduction: The quality of nursing documentation is still a challenge in the nursing profession and, thus, in the health care industry. One major quality improvement program is clinical governance, whose mission is to continuously improve the quality of patient care and overcome service quality problems. The aim of this study was to identify whether clinical governance improves the quality of nursing documentation. Methods: A quasi-experimental method was used to show nursing documentation quality improvement after a 2-year clinical governance implementation. Two hundred twenty random nursing documents were assessed structurally and by content using a valid and reliable researcher made checklist. Results: There were no differences between a nurse's demographic data before and after 2 years (P>0.05 and the nursing documentation score did not improve after a 2-year clinical governance program. Conclusion: Although some efforts were made to improve nursing documentation through clinical governance, these were not sufficient and more attempts are needed. Keywords: nursing documentation, clinical governance, quality improvement, nursing record
Documenting costs and yield of crops of organic origin

Directory of Open Access Journals (Sweden)

J.P. Melnychuk

2016-06-01

Full Text Available The article focuses on the study of primary cost accounting and output of organic crop production. The article has also agreed the key issues that ensure in the primary accounting of organic crop production. For the survey we have used such general scientific methods as induction and deduction, dialectic, historical and systematic methods and some specific methods of accounting which include documentation, inventory, assessment, calculation, accounting records, double entry, balance sheet and financial statements. . As for the documentation of costs and yield of crops of organic origin, it should be noted that documentation is an important method of accounting as it’s the basis of initial observation of commercial operations and it’s a prerequisite for their reflection in accounting. The article has highlighted the features of documenting the posting of production costs and crop production of organic origin, and has also studied the order of registration of land in the operating lease for the production of organic products. The author submits the suggestions for improvement of documenting costs and yields of organic crop production in order to develop reliable information about the costs of production and the grown crop of organic origin for management decision-making.
A Sample Typology of Texts in Corporate Discourse

Directory of Open Access Journals (Sweden)

Jacek Kołata

2009-11-01

Full Text Available The subject matter of this article is to present a working typology of different texts existing in corporate discourse. The data for the following analysis are drawn from various groups of documents existing in Nestle Corporation. The division into categories was possible after highlighting the most discriminative features of the texts under investigation. Moreover, it gives me the possibility to reveal how texts are shaped by contexts in which they exist. Bearing the above in mind, we must not forget that written utterances are always influenced by different but closely related parameters, such as a sender, a recipient, a particular incident and an aim of the conversation – to be more precise they cannot exist independently. This paper attempts at pointing out the weakness and merits of the corporate discourse communication system in the described company and by doing so, facilitate the flow of information among all departments, employees and factories.
Information Types in Nonmimetic Documents: A Review of Biddle's Wipe-Clean Slate (Understanding Documents).

Science.gov (United States)

Mosenthal, Peter B.; Kirsch, Irwin S.

1991-01-01

Describes how the 16 permanent lists used by a first grade reading teacher (and mother of 6) to manage the household represents the whole range of documents covered in the 3 major types of documents: matrix documents, graphic documents, and locative documents. Suggests class activities to clarify students' understanding of the information in…
METHOD OF RARE TERM CONTRASTIVE EXTRACTION FROM NATURAL LANGUAGE TEXTS

Directory of Open Access Journals (Sweden)

I. A. Bessmertny

2017-01-01

Full Text Available The paper considers a problem of automatic domain term extraction from documents corpus by means of a contrast collection. Existing contrastive methods successfully extract often used terms but mishandle rare terms. This could yield poorness of the resulting thesaurus. Assessment of point-wise mutual information is one of the known statistical methods of term extraction and it finds rare terms successfully. Although, it extracts many false terms at that. The proposed approach consists of point-wise mutual information application for rare terms extraction and filtering of candidates by criterion of joint occurrence with the other candidates. We build “documents-by-terms” matrix that is subjected to singular value decomposition to eliminate noise and reveal strong interconnections. Then we pass on to the resulting matrix “terms-by-terms” that reproduces strength of interconnections between words. This approach was approved on a documents collection from “Geology” domain with the use of contrast documents from such topics as “Politics”, “Culture”, “Economics” and “Accidents” on some Internet resources. The experimental results demonstrate operability of this method for rare terms extraction.
Technical approach document

International Nuclear Information System (INIS)

1988-04-01

This document describes the general technical approaches and design criteria adopted by the US Department of Energy (DOE) in order to implement Remedial Action Plans (RAPs) and final designs that comply with EPS standards. This document is a revision to the original document. Major revisions were made to the sections in riprap selection and sizing, and ground-water; only minor revisions were made to the remainder of the document. The US Nuclear Regulatory Commission (NRC) has prepared a Standard Review Plan (NRC-SRP) which describes factors to be considered by the NRC in approving the RAP. Sections 3.0, 4.0, 5.0, and 7.0 of this document are arranged under the same headings as those used in the NRC-SRP. This approach is adopted in order to facilitate joint use of the documents. Section 2.0 (not included in the NRC-SRP) discusses design considerations; Section 3.0 describes surface-water hydrology and erosion control; Section 4.0 describes geotechnical aspects of pile design; Section 5.0 discusses the Alternate Site Selection Process; Section 6.0 deals with radiological issues (in particular, the design of the radon barrier); Section 7.0 discusses protection of groundwater resources; and Section 8.0 discusses site design criteria for the RAC

How Are Researching and Reading Interwieved during Retrieval from Hierarchically Structured Documents?

DEFF Research Database (Denmark)

Hertzum, Morten; Lalmas, M.; Frøkjær, Erik

2001-01-01

Effective use of information retrieval systems requires that users know when to – temporarily – cease searching to do some reading and where to start reading. In hierarchically structured documents, users can to some extent interchange searching and reading by entering the text at different levels...... information retrieval systems could exploit document structure to return the best points to support reading, rather than merely hits...
The Use of Speech Technology to Protect the Document Turnover

Directory of Open Access Journals (Sweden)

Alexandr M. Alyushin

2017-06-01

Full Text Available The wide current paper documents implementation in practice workflows are shown. The basic aspects of document protection related to the protection of their content and legal components are underlined. For contextual component assigned semantic information aspect of the document is considered. For legal component attributed facts and conditions for the creation, approval, negotiation of the document to specific persons is viewed. The documents protection problem importance is shown in connection with possible terrorist threats. The importance of such factor as the time of fraud detection towards the efficiency of documents protection is shown. The fraud detection time requirements for documents of different nature – financial, legal, management is analyzed. The documents used for the operational management of dangerous objects is point out as the most sensitive to the falsification. It is shown that their deliberate falsification can lead to accidents and technogenic catastrophes and human casualties. A comparative analysis of currently used protecting documents methods are presented. Biometric and non-biometric methods of documents protection are point out.Theanalysis of their short comings are given. The conclusion about the prospects of document protection on the basis of the voice signature technology are done. The basic steps of voice information processing in the implementation of this technology are analyzed. The software that implements a documents counterfeiting new protection technology is proposed. The technology is based on the audiomarkers usage at the end of the document, which contains a general information about it. The technology is applicable to the wide range of documents such as financial and valuable papers, contracts, etc. One of the most important advantages of this technology is that any changes in the document can not be done without the author of the document because audiomarker keeps the biometric data of the person
DOCUMENT IMAGE REGISTRATION FOR IMPOSED LAYER EXTRACTION

Directory of Open Access Journals (Sweden)

Surabhi Narayan

2017-02-01

Full Text Available Extraction of filled-in information from document images in the presence of template poses challenges due to geometrical distortion. Filled-in document image consists of null background, general information foreground and vital information imposed layer. Template document image consists of null background and general information foreground layer. In this paper a novel document image registration technique has been proposed to extract imposed layer from input document image. A convex polygon is constructed around the content of the input and the template image using convex hull. The vertices of the convex polygons of input and template are paired based on minimum Euclidean distance. Each vertex of the input convex polygon is subjected to transformation for the permutable combinations of rotation and scaling. Translation is handled by tight crop. For every transformation of the input vertices, Minimum Hausdorff distance (MHD is computed. Minimum Hausdorff distance identifies the rotation and scaling values by which the input image should be transformed to align it to the template. Since transformation is an estimation process, the components in the input image do not overlay exactly on the components in the template, therefore connected component technique is applied to extract contour boxes at word level to identify partially overlapping components. Geometrical features such as density, area and degree of overlapping are extracted and compared between partially overlapping components to identify and eliminate components common to input image and template image. The residue constitutes imposed layer. Experimental results indicate the efficacy of the proposed model with computational complexity. Experiment has been conducted on variety of filled-in forms, applications and bank cheques. Data sets have been generated as test sets for comparative analysis.
Large hydropower generating units

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-07-01

This document presents the Brazilian experience with the design, fabrication, construction, commissioning and operation of large scale and generation capacity unities. The experience had been acquired with the implementation of Itumbiara, Paulo Afonso IV, Tucurui, Itaipu and Xingo power plants, which are among the largest world unities.
Distributed and Conditional Documents: Conceptualizing Bibliographical Alterities

Directory of Open Access Journals (Sweden)

Johanna Drucker

2014-11-01

Full Text Available To conceptualize a future history of the book we have to recognize that our understanding of the bibliographical object of the past is challenged by the ontologically unbound, distributed, digital, and networked conditions of the present. As we draw on rich intellectual traditions, we must keep in view the need to let go of the object-centered approach that is at the heart of book history. My argument begins, therefore, with a few assertions. First, that we have much to learn from the scholarship on Old and New World contact that touches on bibliography, document studies, and book history for formulating a non-object centered conception of what a book is. Second, that the insights from these studies can be usefully combined with a theory of the “conditional” document to develop the model of the kinds of distributed artifacts we encounter on a daily basis in the networked conditions of current practices. Finally, I would suggest that this model provides a different conception of artifacts (books, documents, works of textual or graphic art, one in which reception is production and therefore all materiality is subject to performative engagement within varied, and specific, conditions of encounter.
Sourcing in Professional Education: Do Text Factors Make Any Difference?

Science.gov (United States)

Bråten, Ivar; Strømsø, Helge I.; Andreassen, Rune

2016-01-01

The present study investigated the extent to which the text factors of source salience and emphasis on risk might influence readers' attention to and use of source information when reading single documents to make behavioral decisions on controversial health-related issues. Participants (n = 259), who were attending different bachelor-level…
The Effects of a Web-Based Vocabulary Development Tool on Student Reading Comprehension of Science Texts

Directory of Open Access Journals (Sweden)

Karen Thompson

2012-10-01

Full Text Available The complexities of reading comprehension have received increasing recognition in recent years. In this realm, the power of vocabulary in predicting cognitive challenges in phonological, orthographic, and semantic processes is well documented. In this study, we present a web-based vocabulary development tool that has a series of interactive displays, including a list of the 50 most frequent words in a particular text, Google image and video results for any combination of those words, definitions, and synonyms for particular words from the text, and a list of sentences from the text in which particular words appear. Additionally, we report the results of an experiment that was performed working collaboratively with middle school science teachers from a large urban district in the United States. While this experiment did not show a significant positive effect of this tool on reading comprehension in science, we did find that girls seem to score worse on a reading comprehension assessment after using our web-based tool. This result could reflect prior research that suggests that some girls tend to have a negative attitude towards technology due to gender stereotypes that give girls the impression that they are not as good as boys in working with computers.
A crowdsourcing workflow for extracting chemical-induced disease relations from free text.

Science.gov (United States)

Li, Tong Shu; Bravo, Àlex; Furlong, Laura I; Good, Benjamin M; Su, Andrew I

2016-01-01

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available athttps://github.com/SuLab/crowd_cid_relexDatabase URL:https://github.com/SuLab/crowd_cid_relex. © The Author(s) 2016. Published by Oxford University Press.
A crowdsourcing workflow for extracting chemical-induced disease relations from free text

Science.gov (United States)

Li, Tong Shu; Bravo, Àlex; Furlong, Laura I.; Good, Benjamin M.; Su, Andrew I.

2016-01-01

Relations between chemicals and diseases are one of the most queried biomedical interactions. Although expert manual curation is the standard method for extracting these relations from the literature, it is expensive and impractical to apply to large numbers of documents, and therefore alternative methods are required. We describe here a crowdsourcing workflow for extracting chemical-induced disease relations from free text as part of the BioCreative V Chemical Disease Relation challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition. The total crowdsourcing cost was $1290.67 ($2.58 per abstract) and took a total of 7 h. A qualitative error analysis revealed that 46.66% of sampled errors were due to task limitations and gold standard errors, indicating that performance can still be improved. All code and results are publicly available at https://github.com/SuLab/crowd_cid_relex Database URL: https://github.com/SuLab/crowd_cid_relex PMID:27087308
Enterprise Document Management

Data.gov (United States)

US Agency for International Development — The function of the operation is to provide e-Signature and document management support for Acquisition and Assisitance (A&A) documents including vouchers in...
A review of technology and trends in document delivery services

Energy Technology Data Exchange (ETDEWEB)

Bourne, C P [DIALOG Information Services, Inc., Palo Alto, CA (United States)

1990-05-01

This paper reviews the major lines of technical development being pursued to extend or replace traditional inter-library loan and photocopy service and to facilitate the delivery of source documents to individual end users. Examples of technical approaches discussed are: (1) the inclusion of full text and image data in central online systems; (2) image workstations such as the ADONIS and UMI systems; and (3) the use of electronic networks for document ordering and delivery. Some consideration is given to the policy implications for libraries and information systems. (author). 11 tabs.
A review of technology and trends in document delivery services

International Nuclear Information System (INIS)

Bourne, C.P.

1990-05-01

This paper reviews the major lines of technical development being pursued to extend or replace traditional inter-library loan and photocopy service and to facilitate the delivery of source documents to individual end users. Examples of technical approaches discussed are: 1) the inclusion of full text and image data in central online systems; 2) image workstations such as the ADONIS and UMI systems; and 3) the use of electronic networks for document ordering and delivery. Some consideration is given to the policy implications for libraries and information systems. (author). 11 tabs
A DOCUMENT PAUL AUSTER’S NEW YORK TRILOGY

Directory of Open Access Journals (Sweden)

Natalia N. Smirnova

2017-03-01

Full Text Available The article deals with the way a literary work “creates” a document out of itself, on the example of Paul Auster’s novels. A document here is the report of a character, a private detective who is watching another character (a writer but also the book of a fictional writer who is writing a story of the detective who is watching him, and eventually the book about this whole story. In this case, the search for the other, watching him, is inevitably associated with the search for oneself, self-observation. Biography becomes autobiography, e.g. a document rather than a narrative based on a document. This story becomes projected on the story of Don Quixote (of which “some” Paul Auster, a fictional writer, is writing an essay. The Other is a landmark in the vast desert of fictional worlds where Paul Auster’s Don Quixote wanders alongside other characters of the trilogy. The author may not return from his endless journey through imaginary worlds; his life does not belong to either real life or fiction. He gives life to his characters while remaining invisible himself. Paul Auster’s The New York Trilogy explores such existential situation where the only evidence of the author’s life is a document left by his character. The author leaves a documentary record of a kind about his own existence. It this sense, literature is a document of life and of the endless search for a reason to the existence of an individual who, being not equal to him- or herself, is always the other and never a type or a template.
CRITICISM, ADAPTATION AND ORGANIZATION IN THE COLLABORATIVE CONSTRUCTION OF DOCUMENTS IN THE CLOUDS

Directory of Open Access Journals (Sweden)

Raquel Franco Santos

2016-07-01

Full Text Available Working with text in the digital age brings several challenges for researchers from Computing, Education, and Linguistics, as collaborative writing on the Web. This article presents aspects related to thinking and doing within this context, working with humanistic issues related to Adorno (Criticism and Piaget (Constructivism, the vision of paragraph as unit of text, and technologies for producing Web documents in collaborative learning environments (Oriented Architecture to Service in the Cloud. It is proposed, based on research related to collaborative writing on the Web, a model-driven service for collaborative construction documents in clouds. This paper also presents a tool (CCDC-TEO that implements the proposed model and an example of its application. The results demonstrate the validity of this model.
The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations

International Nuclear Information System (INIS)

1969-01-01

The text of the Agency's agreement for co-operation with the Organization of African Unity (OAU) is reproduced in this document for the information of all Members. The agreement entered into force on 26 March 1969
Image segmentation evaluation for very-large datasets

Science.gov (United States)

Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

2016-03-01

With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.
Lessons learnt from recent citizen science initiatives to document floods in France, Argentina and New Zealand

Directory of Open Access Journals (Sweden)

Le Coz Jérôme

2016-01-01

Full Text Available New communication and digital image technologies have enabled the public to produce and share large quantities of flood observations. Valuable hydraulic data such as water levels, flow rates, inundated areas, etc., can be extracted from photos and movies taken by citizens and help improve the analysis and modelling of flood hazard. We introduce recent citizen science initiatives which have been launched independently by research organisations to document floods in some catchments and urban areas of France, Argentina and New Zealand. Key drivers for success appear to be: a clear and simple procedure, suitable tools for data collecting and processing, an efficient communication plan, the support of local stakeholders, and the public awareness of natural hazards.
INFORMATION SYSTEM OF AUTOMATION OF PREPARATION EDUCATIONAL PROCESS DOCUMENTS

Directory of Open Access Journals (Sweden)

V. A. Matyushenko

2016-01-01

Full Text Available Information technology is rapidly conquering the world, permeating all spheres of human activity. Education is not an exception. An important direction of information of education is the development of university management systems. Modern information systems improve and facilitate the management of all types of activities of the institution. The purpose of this paper is development of system, which allows automating process of formation of accounting documents. The article describes the problem of preparation of the educational process documents. Decided to project and create the information system in Microsoft Access environment. The result is four types of reports obtained by using the developed system. The use of this system now allows you to automate the process and reduce the effort required to prepare accounting documents. All reports was implement in Microsoft Excel software product and can be used for further analysis and processing.
Tank Monitoring and Document control System (TMACS) As Built Software Design Document

International Nuclear Information System (INIS)

GLASSCOCK, J.A.

2000-01-01

This document describes the software design for the Tank Monitor and Control System (TMACS). This document captures the existing as-built design of TMACS as of November 1999. It will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions
A database for TMT interface control documents

Science.gov (United States)

Gillies, Kim; Roberts, Scott; Brighton, Allan; Rogers, John

2016-08-01

The TMT Software System consists of software components that interact with one another through a software infrastructure called TMT Common Software (CSW). CSW consists of software services and library code that is used by developers to create the subsystems and components that participate in the software system. CSW also defines the types of components that can be constructed and their roles. The use of common component types and shared middleware services allows standardized software interfaces for the components. A software system called the TMT Interface Database System was constructed to support the documentation of the interfaces for components based on CSW. The programmer describes a subsystem and each of its components using JSON-style text files. A command interface file describes each command a component can receive and any commands a component sends. The event interface files describe status, alarms, and events a component publishes and status and events subscribed to by a component. A web application was created to provide a user interface for the required features. Files are ingested into the software system's database. The user interface allows browsing subsystem interfaces, publishing versions of subsystem interfaces, and constructing and publishing interface control documents that consist of the intersection of two subsystem interfaces. All published subsystem interfaces and interface control documents are versioned for configuration control and follow the standard TMT change control processes. Subsystem interfaces and interface control documents can be visualized in the browser or exported as PDF files.

INTEGRATION OF COMPUTER TECHNOLOGIES SMK: AUTOMATION OF THE PRODUCTION CERTIFICA-TION PROCEDURE AND FORMING OF SHIPPING DOCUMENTS

Directory of Open Access Journals (Sweden)

S. A. Pavlenko

2009-01-01

Full Text Available Integration of informational computer technologies allowed to reorganize and optimize some processes due to decrease of circulation of documents, unification of documentation forms and others.
ERRORS AND DIFFICULTIES IN TRANSLATING LEGAL TEXTS

Directory of Open Access Journals (Sweden)

Camelia, CHIRILA

2014-11-01

Full Text Available Nowadays the accurate translation of legal texts has become highly important as the mistranslation of a passage in a contract, for example, could lead to lawsuits and loss of money. Consequently, the translation of legal texts to other languages faces many difficulties and only professional translators specialised in legal translation should deal with the translation of legal documents and scholarly writings. The purpose of this paper is to analyze translation from three perspectives: translation quality, errors and difficulties encountered in translating legal texts and consequences of such errors in professional translation. First of all, the paper points out the importance of performing a good and correct translation, which is one of the most important elements to be considered when discussing translation. Furthermore, the paper presents an overview of the errors and difficulties in translating texts and of the consequences of errors in professional translation, with applications to the field of law. The paper is also an approach to the differences between languages (English and Romanian that can hinder comprehension for those who have embarked upon the difficult task of translation. The research method that I have used to achieve the objectives of the paper was the content analysis of various Romanian and foreign authors' works.
The IAEA library and documentation services

International Nuclear Information System (INIS)

1963-01-01

The library was established in 1958 and has since acquired a large and steadily increasing collection of nuclear science literature. The collection can be divided into three broad categories: (a) books, (b) periodicals, and (c) technical reports and official documents. There are at present approximately 19000 books in the library. The books cover all aspects of nuclear science and technology as well as some relevant branches of economics, law and other subjects. In addition, there are numerous reference books, both on science and on other subjects related to the work of the Agency. The library receives 7 00 periodicals, mostly of a scientific or technical nature. Sixty-three per cent of the periodicals are obtained by subscription and some 30 per cent as presentation copies. The rest of the periodicals are received under an exchange of publications. Of particular interest to research workers is a comprehensive collection of abstracting journals. The technical reports and documents are usually obtained from Member Governments and other international organizations, including the United Nations and the specialized agencies. There are nearly 50 000 reports, some 30 000 of which are on microcards. As international organizations and Member States supply the library with their documents and other publications on atomic energy on a routine basis, the collection in the library now represents much of the world's unclassified material on different branches of nuclear science and technology. The library also possesses about 5000 reprints and translations of technical reports
Management of technical documents: a projection at the University of Zulia

Directory of Open Access Journals (Sweden)

Ana Judith Paredes Chacin

2015-11-01

Full Text Available Objective. This paper analyze comprehensively and systematically the principles of organization and technical procedure, which support document management based on the use of information technologies. Method. We developed a study based on the documentary descriptive method in the context of the Dirección de Infraestructura (Dinfra from the Universidad del Zulia. Results. We find evidence of efficiency in the processes that support the management of technical documents: Plans, metrics and memories generated by the Dinfra. Conclusion. The conceptual basis of organizational and technical archives contribute to the systematization, shelter save and documental preservation, and ensure the timely retrieval of technical information for the management of the University of Zulia.
The Texts of the Agency's Relationship Agreements with Specialized Agencies

International Nuclear Information System (INIS)

1988-03-01

The text of the relationship agreement with the Agency has concluded with the United Nations Industrial Development Organization, together with the protocol regarding its entry into force, is reproduced in this document for the information of all Members of the Agency. The agreement entered into force on 9 October 1987 pursuant to Article 10
Undergraduates' Text Messaging Language and Literacy Skills

Science.gov (United States)

Grace, Abbie; Kemp, Nenagh; Martin, Frances Heritage; Parrila, Rauno

2014-01-01

Research investigating whether people's literacy skill is being affected by the use of text messaging language has produced largely positive results for children, but mixed results for adults. We asked 150 undergraduate university students in Western Canada and 86 in South Eastern Australia to supply naturalistic text messages and to complete…
Development of an event-driven parser for active document and web-based nuclear design system

Energy Technology Data Exchange (ETDEWEB)

Park, Yong Soo

2005-02-15

Nuclear design works consist of extensive unit job modules in which many computer codes are used. Each unit module requires time-consuming and erroneous input preparation, code run, output analysis and quality assurance process. The task for safety evaluation of reload core is especially the most man-power intensive and time-consuming due to the large amount of calculations and data exchanges. The purpose of this study is to develop a new nuclear design system called Innovative Design Processor (IDP) in order to minimize human effort and maximize design quality and productivity, and then to achieve an ultimately optimized core loading pattern. Two new basic principles of IDP are the document-oriented design and the web based design. Contrary to the conventional code-oriented or procedure-oriented design, the document-oriented design is human-oriented in that the final document is automatically prepared with complete analysis, table and plots, if the designer writes a design document called active document and feeds it to a parser. This study defined a number of active components and developed an event-driven parser for the active document in HTML (Hypertext Markup Language) or XML (Extensible Markup Language). The active documents can be created on the web, which is another framework of IDP. Using proper mix-up of server side and client side programming under the HAMP (HP-UX/Apache/MySQL/PHP) environment, the document-oriented design process on the web is modeled as a design wizard for designer's convenience and platform independency. This automation using IDP was tested for the reload safety evaluation of Korea Standard Nuclear Power Plant (KSNP) type PWRs. Great time saving was confirmed and IDP can complete several-month jobs in a few days. More optimized core loading pattern, therefore, can be obtained since it takes little time to do the reload safety evaluation tasks with several core loading pattern candidates. Since the technology is also applicable to
Development of an event-driven parser for active document and web-based nuclear design system

International Nuclear Information System (INIS)

Park, Yong Soo

2005-02-01

Nuclear design works consist of extensive unit job modules in which many computer codes are used. Each unit module requires time-consuming and erroneous input preparation, code run, output analysis and quality assurance process. The task for safety evaluation of reload core is especially the most man-power intensive and time-consuming due to the large amount of calculations and data exchanges. The purpose of this study is to develop a new nuclear design system called Innovative Design Processor (IDP) in order to minimize human effort and maximize design quality and productivity, and then to achieve an ultimately optimized core loading pattern. Two new basic principles of IDP are the document-oriented design and the web based design. Contrary to the conventional code-oriented or procedure-oriented design, the document-oriented design is human-oriented in that the final document is automatically prepared with complete analysis, table and plots, if the designer writes a design document called active document and feeds it to a parser. This study defined a number of active components and developed an event-driven parser for the active document in HTML (Hypertext Markup Language) or XML (Extensible Markup Language). The active documents can be created on the web, which is another framework of IDP. Using proper mix-up of server side and client side programming under the HAMP (HP-UX/Apache/MySQL/PHP) environment, the document-oriented design process on the web is modeled as a design wizard for designer's convenience and platform independency. This automation using IDP was tested for the reload safety evaluation of Korea Standard Nuclear Power Plant (KSNP) type PWRs. Great time saving was confirmed and IDP can complete several-month jobs in a few days. More optimized core loading pattern, therefore, can be obtained since it takes little time to do the reload safety evaluation tasks with several core loading pattern candidates. Since the technology is also applicable to the
The interaction region of the large detector concept

Indian Academy of Sciences (India)

design to study alternatives, e.g. the surface assembly of the detector. References. [1] T Behnke et al, Eds., TESLA Technical Design Report Part IV, DESY 2001-011, 2001. [2] The LDC Working Group: Detector outline document for the large detector concept, http://www.ilcldc.org/documents/dod/ (2006). Pramana – J. Phys.
Document management in engineering construction

International Nuclear Information System (INIS)

Liao Bing

2008-01-01

Document management is one important part of systematic quality management, which is one of the key factors to ensure the construction quality. In the engineering construction, quality management and document management shall interwork all the time, to ensure the construction quality. Quality management ensures that the document is correctly generated and adopted, and thus the completeness, accuracy and systematicness of the document satisfy the filing requirements. Document management ensures that the document is correctly transferred during the construction, and various testimonies such as files and records are kept for the engineering construction and its quality management. This paper addresses the document management in the engineering construction based on the interwork of the quality management and document management. (author)
Review of Collection of Documents “Krasnoyarsk Region during the Great Patriotic War. 1941-1945 (On the documents by Archive Agency of Krasnoyarsk Region, 2010. 497 p.”

Directory of Open Access Journals (Sweden)

Dmitrii A. Malyutin

2013-09-01

Full Text Available The paper presents the review of the collection of documents, including the data on social and economic situation in Krasnoyarsk Region during the Great Patriotic War, the activity of party and Soviet authorities, deeds by Krasnoyarsk natives in the frontline and labor achievements in the rear. The collection contains the documents, describing the daily life in wartime, the public mood, the living conditions, social security, the status of disabled veterans. The presented data, concerning patriotic activity of the orthodox church, camps of People's Commissariat for Internal Affairs, facts of desertion, speculation, criminality in the region prove weight and objective approach of the composite author to the documents selection.
Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation

Science.gov (United States)

Rajagopal, Prabha; Ravana, Sri Devi

2017-01-01

Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…
Indexing it all the subject in the age of documentation, information, and data

CERN Document Server

Day, Ronald E

2014-01-01

In this book, Ronald Day offers a critical history of the modern tradition of documentation. Focusing on the documentary index (understood as a mode of social positioning), and drawing on the work of the French documentalist Suzanne Briet, Day explores the understanding and uses of indexicality. He examines the transition as indexes went from being explicit professional structures that mediated users and documents to being implicit infrastructural devices used in everyday information and communication acts. Doing so, he also traces three epistemic eras in the representation of individuals and groups, first in the forms of documents, then information, then data. Day investigates five cases from the modern tradition of documentation. He considers the socio-technical instrumentalism of Paul Otlet, "the father of European documentation" (contrasting it to the hermeneutic perspective of Martin Heidegger); the shift from documentation to information science and the accompanying transformation of persons and texts i...
Shoulder dystocia documentation: an evaluation of a documentation training intervention.

Science.gov (United States)

LeRiche, Tammy; Oppenheimer, Lawrence; Caughey, Sharon; Fell, Deshayne; Walker, Mark

2015-03-01

To evaluate the quality and content of nurse and physician shoulder dystocia delivery documentation before and after MORE training in shoulder dystocia management skills and documentation. Approximately 384 charts at the Ottawa Hospital General Campus involving a diagnosis of shoulder dystocia between the years of 2000 and 2006 excluding the training year of 2003 were identified. The charts were evaluated for 14 key components derived from a validated instrument. The delivery notes were then scored based on these components by 2 separate investigators who were blinded to delivery note author, date, and patient identification to further quantify delivery record quality. Approximately 346 charts were reviewed for physician and nurse delivery documentation. The average score for physician notes was 6 (maximum possible score of 14) both before and after the training intervention. The nurses' average score was 5 before and after the training intervention. Negligible improvement was observed in the content and quality of shoulder dystocia documentation before and after nurse and physician training.
Arxius documentals en publicitat : Centro Documental para la Conservación del Patrimonio Publicitario Español (Publidocnet

Directory of Open Access Journals (Sweden)

Marcos Recio, Juan Carlos

2015-06-01

Full Text Available Un model documental per a la publicitat ha de promoure i conservar el patrimoni amb totes les eines possibles que té a l'abast. Publidocnet és un centre de documentació actiu que ofereix als estudiants, investigadors i persones interessades en la publicitat una visió analítica de les campanyes, informació relacionada, poder veure'n l'anunci televisiu, escoltar-ne les falques, veure'n les imatges i estudiar-ne la fitxa tècnica; en definitiva, una manera d'entendre i conèixer la publicitat per mitjà de les creacions dels publicitaris. Té com a objectiu la conservació del patrimoni, en un acord tàcit amb les agències de publicitat, que donen el material d'estudi per als alumnes. Publidocnet és, doncs, un centre documental multimèdia, gràfic i textual de la publicitat espanyola.Un modelo documental para la publicidad ha de promover y conservar el patrimonio con todas las herramientas posibles a su alcance. Publidocnet es un centro de documentación activo que ofrece a los estudiantes, investigadores y personas interesadas en la publicidad una visión analítica de las campañas, información sobre ellas, poder ver el anuncio televisivo, escuchar las cuñas, ver las imágenes de las campañas y estudiar la ficha técnica; en definitiva, una manera de entender y conocer la publicidad a través de las creaciones de los publicitarios. Su fin último es la conservación del patrimonio, en un acuerdo tácito con las agencias de publicidad que donan el material para el estudio por parte de los alumnos y para su conservación. Publidocnet es, pues, un centro documental multimedia, gráfico y textual de la publicidad española.A documentary model for advertising has to promote and conserve advertising heritage with all the instruments at its disposal. Publidocnet, Centro Documental para la Conservación del Patrimonio Publicitario Español, is a centre for conserving documentary heritage in advertising in which students, researchers and
SGML-Based Markup for Literary Texts: Two Problems and Some Solutions.

Science.gov (United States)

Barnard, David; And Others

1988-01-01

Identifies the Standard Generalized Markup Language (SGML) as the best basis for a markup standard for encoding literary texts. Outlines solutions to problems using SGML and discusses the problem of maintaining multiple views of a document. Examines several ways of reducing the burden of markups. (GEA)
Connecting Knowledge for Text Construction through the Use of Graphic Organizers

OpenAIRE

Reyes, Elsy Camila

2011-01-01

This study analyzed how basic level students comprehend short descriptive texts and rewrite their texts through the use of graphic organizers (GOs). The research was built upon the qualitative research paradigm with the inclusion of descriptive and introspective approaches. The study was carried out at a prestigious private school in Bogotá, Colombia, with basic English level II sixth graders. Data was gathered through focus groups, GOs, and students' documents. The results of the study demon...
Towards a Pattern Language Approach to Document Description

Directory of Open Access Journals (Sweden)

Robert Waller

2012-07-01

Full Text Available Pattern libraries, originating in architecture, are a common way to share design solutions in interaction design and software engineering. Our aim in this paper is to consider patterns as a way of describing commonly-occurring document design solutions to particular problems, from two points of view. First, we are interested in their use as exemplars for designers to follow, and second, we suggest them as a means of understanding linguistic and graphical data for their organization into corpora that will facilitate descriptive work. We discuss the use of patterns across a range of disciplines before suggesting the need to place patterns in the context of genres, with each potentially belonging to a “home genre” in which it originates and to which it makes an implicit intertextual reference intended to produce a particular reader response in the form of a reading strategy or interpretative stance. We consider some conceptual and technical issues involved in the descriptive study of patterns in naturally-occurring documents, including the challenges involved in building a document corpus.
Text-Mining Applications for Creation of Biofilm Literature Database

Directory of Open Access Journals (Sweden)

Kanika Gupta

2017-10-01

So in the present research published corpora of 34306 documents for biofilm was collected from PubMed database along with non-indexed resources like books, conferences, newspaper articles, etc. and these were divided into five categories i.e. classification, growth and development, physiology, drug effects and radiation effects. These five categories were further individually divided into three parts i.e. Journal Title, Abstract Title, and Abstract Text to make indexing highly specific. Text-processing was done using the software Rapid Miner_v5.3, which tokenizes the entire text into words and provides the frequency of each word within the document. The obtained words were normalized using Remove Stop and Stem Word command of Rapid Miner_v5.3 which removes the stopping and stemming words. The obtained words were stored in MS-Excel 2007 and were sorted in decreasing order of frequency using Sort & Filter command of MS-Excel 2007. The words are visualization through networks obtained by Cytoscape_v2.7.0. Now the words obtained were highly specific for biofilms, generating a controlled biofilm vocabulary and this vocabulary could be used for indexing articles for biofilm (similar to MeSH database which indexes articles for PubMed. The obtained keywords information was stored in the relational database which is locally hosted using the WAMP_v2.4 (Windows, Apache, MySQL, PHP server. The available biofilm vocabulary will be significant for researchers studying biofilm literature, making their search easy and efficient.
Tank Monitoring and Document control System (TMACS) As Built Software Design Document

Energy Technology Data Exchange (ETDEWEB)

GLASSCOCK, J.A.

2000-01-27

This document describes the software design for the Tank Monitor and Control System (TMACS). This document captures the existing as-built design of TMACS as of November 1999. It will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.