WorldWideScience

Sample records for ascii text documents

  1. ASCII Art Synthesis from Natural Photographs.

    Science.gov (United States)

    Xu, Xuemiao; Zhong, Linyuan; Xie, Minshan; Liu, Xueting; Qin, Jing; Wong, Tien-Tsin

    2017-08-01

    While ASCII art is a worldwide popular art form, automatic generating structure-based ASCII art from natural photographs remains challenging. The major challenge lies on extracting the perception-sensitive structure from the natural photographs so that a more concise ASCII art reproduction can be produced based on the structure. However, due to excessive amount of texture in natural photos, extracting perception-sensitive structure is not easy, especially when the structure may be weak and within the texture region. Besides, to fit different target text resolutions, the amount of the extracted structure should also be controllable. To tackle these challenges, we introduce a visual perception mechanism of non-classical receptive field modulation (non-CRF modulation) from physiological findings to this ASCII art application, and propose a new model of non-CRF modulation which can better separate the weak structure from the crowded texture, and also better control the scale of texture suppression. Thanks to our non-CRF model, more sensible ASCII art reproduction can be obtained. In addition, to produce more visually appealing ASCII arts, we propose a novel optimization scheme to obtain the optimal placement of proportional-font characters. We apply our method on a rich variety of images, and visually appealing ASCII art can be obtained in all cases.

  2. SEGY to ASCII: Conversion and Plotting Program

    Science.gov (United States)

    Goldman, Mark R.

    1999-01-01

    This report documents a computer program to convert standard 4 byte, IBM floating point SEGY files to ASCII xyz format. The program then optionally plots the seismic data using the GMT plotting package. The material for this publication is contained in a standard tar file (of99-126.tar) that is uncompressed and 726 K in size. It can be downloaded by any Unix machine. Move the tar file to the directory you wish to use it in, then type 'tar xvf of99-126.tar' The archive files (and diskette) contain a NOTE file, a README file, a version-history file, source code, a makefile for easy compilation, and an ASCII version of the documentation. The archive files (and diskette) also contain example test files, including a typical SEGY file along with the resulting ASCII xyz and postscript files. Requirements for compiling the source code into an executable are a C++ compiler. The program has been successfully compiled using Gnu's g++ version 2.8.1, and use of other compilers may require modifications to the existing source code. The g++ compiler is a free, high quality C++ compiler and may be downloaded from the ftp site: ftp://ftp.gnu.org/gnu Requirements for plotting the seismic data is the existence of the GMT plotting package. The GMT plotting package may be downloaded from the web site: http://www.soest.hawaii.edu/gmt/

  3. Raw Data (ASCII format) - PLACE | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PLACE Raw Data (ASCII format) Data detail Data name Raw Data (ASCII format) DOI 10.18908/lsd...miter About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Raw Data (ASCII format) - PLACE | LSDB Archive ...

  4. Rosetta: Ensuring the Preservation and Usability of ASCII-based Data into the Future

    Science.gov (United States)

    Ramamurthy, M. K.; Arms, S. C.

    2015-12-01

    Field data obtained from dataloggers often take the form of comma separated value (CSV) ASCII text files. While ASCII based data formats have positive aspects, such as the ease of accessing the data from disk and the wide variety of tools available for data analysis, there are some drawbacks, especially when viewing the situation through the lens of data interoperability and stewardship. The Unidata data translation tool, Rosetta, is a web-based service that provides an easy, wizard-based interface for data collectors to transform their datalogger generated ASCII output into Climate and Forecast (CF) compliant netCDF files following the CF-1.6 discrete sampling geometries. These files are complete with metadata describing what data are contained in the file, the instruments used to collect the data, and other critical information that otherwise may be lost in one of many README files. The choice of the machine readable netCDF data format and data model, coupled with the CF conventions, ensures long-term preservation and interoperability, and that future users will have enough information to responsibly use the data. However, with the understanding that the observational community appreciates the ease of use of ASCII files, methods for transforming the netCDF back into a CSV or spreadsheet format are also built-in. One benefit of translating ASCII data into a machine readable format that follows open community-driven standards is that they are instantly able to take advantage of data services provided by the many open-source data server tools, such as the THREDDS Data Server (TDS). While Rosetta is currently a stand-alone service, this talk will also highlight efforts to couple Rosetta with the TDS, thus allowing self-publishing of thoroughly documented datasets by the data producers themselves.

  5. Transliterating non-ASCII characters with Python

    Directory of Open Access Journals (Sweden)

    Seth Bernstein

    2013-10-01

    Full Text Available This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.

  6. State Of The Art In Digital Steganography Focusing ASCII Text Documents

    OpenAIRE

    Rafat, Khan Farhan; Sher, Muhammad

    2010-01-01

    Digitization of analogue signals has opened up new avenues for information hiding and the recent advancements in the telecommunication field has taken up this desire even further. From copper wire to fiber optics, technology has evolved and so are ways of covert channel communication. By "Covert" we mean "anything not meant for the purpose for which it is being used". Investigation and detection of existence of such cover channel communication has always remained a serious concern of informat...

  7. SEGY to ASCII Conversion and Plotting Program 2.0

    Science.gov (United States)

    Goldman, Mark R.

    2005-01-01

    INTRODUCTION SEGY has long been a standard format for storing seismic data and header information. Almost every seismic processing package can read and write seismic data in SEGY format. In the data processing world, however, ASCII format is the 'universal' standard format. Very few general-purpose plotting or computation programs will accept data in SEGY format. The software presented in this report, referred to as SEGY to ASCII (SAC), converts seismic data written in SEGY format (Barry et al., 1975) to an ASCII data file, and then creates a postscript file of the seismic data using a general plotting package (GMT, Wessel and Smith, 1995). The resulting postscript file may be plotted by any standard postscript plotting program. There are two versions of SAC: one version for plotting a SEGY file that contains a single gather, such as a stacked CDP or migrated section, and a second version for plotting multiple gathers from a SEGY file containing more than one gather, such as a collection of shot gathers. Note that if a SEGY file has multiple gathers, then each gather must have the same number of traces per gather, and each trace must have the same sample interval and number of samples per trace. SAC will read several common standards of SEGY data, including SEGY files with sample values written in either IBM or IEEE floating-point format. In addition, utility programs are present to convert non-standard Seismic Unix (.sux) SEGY files and PASSCAL (.rsy) SEGY files to standard SEGY files. SAC allows complete user control over all plotting parameters including label size and font, tick mark intervals, trace scaling, and the inclusion of a title and descriptive text. SAC shell scripts create a postscript image of the seismic data in vector rather than bitmap format, using GMT's pswiggle command. Although this can produce a very large postscript file, the image quality is generally superior to that of a bitmap image, and commercial programs such as Adobe Illustrator

  8. Documents and legal texts

    International Nuclear Information System (INIS)

    2017-01-01

    This section treats of the following documents and legal texts: 1 - Belgium 29 June 2014 - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy; 2 - Belgium, 7 December 2016. - Act amending the Act of 22 July 1985 on Third-Party Liability in the Field of Nuclear Energy

  9. Script-independent text line segmentation in freestyle handwritten documents.

    Science.gov (United States)

    Li, Yi; Zheng, Yefeng; Doermann, David; Jaeger, Stefan; Li, Yi

    2008-08-01

    Text line segmentation in freestyle handwritten documents remains an open document analysis problem. Curvilinear text lines and small gaps between neighboring text lines present a challenge to algorithms developed for machine printed or hand-printed documents. In this paper, we propose a novel approach based on density estimation and a state-of-the-art image segmentation technique, the level set method. From an input document image, we estimate a probability map, where each element represents the probability that the underlying pixel belongs to a text line. The level set method is then exploited to determine the boundary of neighboring text lines by evolving an initial estimate. Unlike connected component based methods ( [1], [2] for example), the proposed algorithm does not use any script-specific knowledge. Extensive quantitative experiments on freestyle handwritten documents with diverse scripts, such as Arabic, Chinese, Korean, and Hindi, demonstrate that our algorithm consistently outperforms previous methods [1]-[3]. Further experiments show the proposed algorithm is robust to scale change, rotation, and noise.

  10. Text-interpreter language for flexible generation of patient notes and instructions.

    Science.gov (United States)

    Forker, T S

    1992-01-01

    An interpreted computer language has been developed along with a windowed user interface and multi-printer-support formatter to allow preparation of documentation of patient visits, including progress notes, prescriptions, excuses for work/school, outpatient laboratory requisitions, and patient instructions. Input is by trackball or mouse with little or no keyboard skill required. For clinical problems with specific protocols, the clinician can be prompted with problem-specific items of history, exam, and lab data to be gathered and documented. The language implements a number of text-related commands as well as branching logic and arithmetic commands. In addition to generating text, it is simple to implement arithmetic calculations such as weight-specific drug dosages; multiple branching decision-support protocols for paramedical personnel (or physicians); and calculation of clinical scores (e.g., coma or trauma scores) while simultaneously documenting the status of each component of the score. ASCII text files produced by the interpreter are available for computerized quality audit. Interpreter instructions are contained in text files users can customize with any text editor.

  11. Documents and legal texts

    International Nuclear Information System (INIS)

    2016-01-01

    This section treats of the following documents and legal texts: 1 - Brazil: Law No. 13,260 of 16 March 2016 (To regulate the provisions of item XLIII of Article 5 of the Federal Constitution on terrorism, dealing with investigative and procedural provisions and redefining the concept of a terrorist organisation; and amends Laws No. 7,960 of 21 December 1989 and No. 12,850 of 2 August 2013); 2 - India: The Atomic Energy (Amendment) Act, 2015; Department Of Atomic Energy Notification (Civil Liability for Nuclear Damage); 3 - Japan: Act on Subsidisation, etc. for Nuclear Damage Compensation Funds following the implementation of the Convention on Supplementary Compensation for Nuclear Damage

  12. Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization

    Science.gov (United States)

    Hadyan, Fadhlil; Shaufiah; Arif Bijaksana, Moch.

    2017-01-01

    Automatic summarization is a system that can help someone to take the core information of a long text instantly. The system can help by summarizing text automatically. there’s Already many summarization systems that have been developed at this time but there are still many problems in those system. In this final task proposed summarization method using document index graph. This method utilizes the PageRank and HITS formula used to assess the web page, adapted to make an assessment of words in the sentences in a text document. The expected outcome of this final task is a system that can do summarization of a single document, by utilizing document index graph with TextRank and HITS to improve the quality of the summary results automatically.

  13. Classification process in a text document recommender system

    Directory of Open Access Journals (Sweden)

    Dan MUNTEANU

    2005-12-01

    Full Text Available This paper presents the classification process in a recommender system used for textual documents taken especially from web. The system uses in the classification process a combination of content filters, event filters and collaborative filters and it uses implicit and explicit feedback for evaluating documents.

  14. Survey Report – State Of The Art In Digital Steganography Focusing ASCII Text Documents

    OpenAIRE

    Khan Farhan Rafat; Muhammad Sher

    2010-01-01

    Digitization of analogue signals has opened up new avenues for information hiding and the recent advancements in the telecommunication field has taken up this desire even further. From copper wire to fiber optics, technology has evolved and so are ways of covert channel communication. By “Covert” we mean “anything not meant for the purpose for which it is being used”. Investigation and detection of existence of such cover channel communication has always remained a serious concern of informat...

  15. Text Mining in Biomedical Domain with Emphasis on Document Clustering.

    Science.gov (United States)

    Renganathan, Vinaitheerthan

    2017-07-01

    With the exponential increase in the number of articles published every year in the biomedical domain, there is a need to build automated systems to extract unknown information from the articles published. Text mining techniques enable the extraction of unknown knowledge from unstructured documents. This paper reviews text mining processes in detail and the software tools available to carry out text mining. It also reviews the roles and applications of text mining in the biomedical domain. Text mining processes, such as search and retrieval of documents, pre-processing of documents, natural language processing, methods for text clustering, and methods for text classification are described in detail. Text mining techniques can facilitate the mining of vast amounts of knowledge on a given topic from published biomedical research articles and draw meaningful conclusions that are not possible otherwise.

  16. Documents and legal texts

    International Nuclear Information System (INIS)

    2013-01-01

    This section reprints a selection of recently published legislative texts and documents: - Russian Federation: Federal Law No.170 of 21 November 1995 on the use of atomic energy, Adopted by the State Duma on 20 October 1995; - Uruguay: Law No.19.056 On the Radiological Protection and Safety of Persons, Property and the Environment (4 January 2013); - Japan: Third Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (concerning Damages related to Rumour-Related Damage in the Agriculture, Forestry, Fishery and Food Industries), 30 January 2013; - France and the United States: Joint Statement on Liability for Nuclear Damage (Aug 2013); - Franco-Russian Nuclear Power Declaration (1 November 2013)

  17. CRED 20m Gridded bathymetry of Necker Island, Hawaii, USA (Arc ASCII format)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Gridded bathymetry of the shelf and slope environments of Necker Island, Northwestern Hawaiian Islands, Hawaii, USA. This ASCII includes multibeam bathymetry from...

  18. Semantic Document Image Classification Based on Valuable Text Pattern

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2011-01-01

    Full Text Available Knowledge extraction from detected document image is a complex problem in the field of information technology. This problem becomes more intricate when we know, a negligible percentage of the detected document images are valuable. In this paper, a segmentation-based classification algorithm is used to analysis the document image. In this algorithm, using a two-stage segmentation approach, regions of the image are detected, and then classified to document and non-document (pure region regions in the hierarchical classification. In this paper, a novel valuable definition is proposed to classify document image in to valuable or invaluable categories. The proposed algorithm is evaluated on a database consisting of the document and non-document image that provide from Internet. Experimental results show the efficiency of the proposed algorithm in the semantic document image classification. The proposed algorithm provides accuracy rate of 98.8% for valuable and invaluable document image classification problem.

  19. CRED 20m Gridded bathymetry of Nihoa Island, Hawaii, USA (Arc ASCII format)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Gridded bathymetry (20m) of the shelf and slope environments of Nihoa Island, Hawaii, USA. The ASCII includes multibeam bathymetry from the Simrad EM120, Simrad...

  20. The Implementation of Cosine Similarity to Calculate Text Relevance between Two Documents

    Science.gov (United States)

    Gunawan, D.; Sembiring, C. A.; Budiman, M. A.

    2018-03-01

    Rapidly increasing number of web pages or documents leads to topic specific filtering in order to find web pages or documents efficiently. This is a preliminary research that uses cosine similarity to implement text relevance in order to find topic specific document. This research is divided into three parts. The first part is text-preprocessing. In this part, the punctuation in a document will be removed, then convert the document to lower case, implement stop word removal and then extracting the root word by using Porter Stemming algorithm. The second part is keywords weighting. Keyword weighting will be used by the next part, the text relevance calculation. Text relevance calculation will result the value between 0 and 1. The closer value to 1, then both documents are more related, vice versa.

  1. Machine printed text and handwriting identification in noisy document images.

    Science.gov (United States)

    Zheng, Yefeng; Li, Huiping; Doermann, David

    2004-03-01

    In this paper, we address the problem of the identification of text in noisy document images. We are especially focused on segmenting and identifying between handwriting and machine printed text because: 1) Handwriting in a document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content and 2) the segmentation and recognition techniques requested for machine printed and handwritten text are significantly different. A novel aspect of our approach is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise and we further exploit context to refine the classification. A Markov Random Field-based (MRF) approach is used to model the geometrical structure of the printed text, handwriting, and noise to rectify misclassifications. Experimental results show that our approach is robust and can significantly improve page segmentation in noisy document collections.

  2. Transfer of numeric ASCII data files between Apple and IBM personal computers.

    Science.gov (United States)

    Allan, R W; Bermejo, R; Houben, D

    1986-01-01

    Listings for programs designed to transfer numeric ASCII data files between Apple and IBM personal computers are provided with accompanying descriptions of how the software operates. Details of the hardware used are also given. The programs may be easily adapted for transferring data between other microcomputers.

  3. MeSH: a window into full text for document summarization.

    Science.gov (United States)

    Bhattacharya, Sanmitra; Ha-Thuc, Viet; Srinivasan, Padmini

    2011-07-01

    Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu.

  4. PHYSICAL MODELLING OF TERRAIN DIRECTLY FROM SURFER GRID AND ARC/INFO ASCII DATA FORMATS#

    Directory of Open Access Journals (Sweden)

    Y.K. Modi

    2012-01-01

    Full Text Available

    ENGLISH ABSTRACT: Additive manufacturing technology is used to make physical models of terrain using GIS surface data. Attempts have been made to understand several other GIS file formats, such as the Surfer grid and the ARC/INFO ASCII grid. The surface of the terrain in these file formats has been converted into an STL file format that is suitable for additive manufacturing. The STL surface is converted into a 3D model by making the walls and the base. In this paper, the terrain modelling work has been extended to several other widely-used GIS file formats. Terrain models can be created in less time and at less cost, and intricate geometries of terrain can be created with ease and great accuracy.

    AFRIKAANSE OPSOMMING: Laagvervaardigingstegnologie word gebruik om fisiese modelle van terreine vanaf GIS oppervlakdata te maak. Daar is gepoog om verskeie ander GIS lêerformate, soos die Surfer rooster en die ARC/INFO ASCII rooster, te verstaan. Die oppervlak van die terrein in hierdie lêerformate is omgeskakel in 'n STL lêerformaat wat geskik is vir laagvervaardiging. Verder is die STL oppervlak omgeskakel in 'n 3D model deur die kante en die basis te modelleer. In hierdie artikel is die terreinmodelleringswerk uitgebrei na verskeie ander algemeen gebruikte GIS lêerformate. Terreinmodelle kan so geskep word in korter tyd en teen laer koste, terwyl komplekse geometrieë van terreine met gemak en groot akkuraatheid geskep kan word.

  5. A Typed Text Retrieval Query Language for XML Documents.

    Science.gov (United States)

    Colazzo, Dario; Sartiani, Carlo; Albano, Antonio; Manghi, Paolo; Ghelli, Giorgio; Lini, Luca; Paoli, Michele

    2002-01-01

    Discussion of XML focuses on a description of Tequyla-TX, a typed text retrieval query language for XML documents that can search on both content and structures. Highlights include motivations; numerous examples; word-based and char-based searches; tag-dependent full-text searches; text normalization; query algebra; data models and term language;…

  6. Text extraction method for historical Tibetan document images based on block projections

    Science.gov (United States)

    Duan, Li-juan; Zhang, Xi-qun; Ma, Long-long; Wu, Jian

    2017-11-01

    Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.

  7. Effective ASCII-HEX steganography for secure cloud

    International Nuclear Information System (INIS)

    Afghan, S.

    2015-01-01

    There are many reasons of cloud computing popularity some of the most important are; backup and rescue, cost effective, nearly limitless storage, automatic software amalgamation, easy access to information and many more. Pay-as-you-go model is followed to provide everything as a service. Data is secured by using standard security policies available at cloud end. In spite of its many benefits, as mentioned above, cloud computing has also some security issues. Provider as well as customer has to provide and collect data in a secure manner. Both of these issues plus efficient transmitting of data over cloud are very critical issues and needed to be resolved. There is need of security during the travel time of sensitive data over the network that can be processed or stored by the customer. Security to the customer's data at the provider end can be provided by using current security algorithms, which are not known by the customer. There is reliability problem due to existence of multiple boundaries in the cloud resource access. ASCII and HEX security with steganography is used to propose an algorithm that stores the encrypted data/cipher text in an image file which will be then sent to the cloud end. This is done by using CDM (Common Deployment Model). In future, an algorithm should be proposed and implemented for the security of virtual images in the cloud computing. (author)

  8. Classification of protein-protein interaction full-text documents using text and citation network features.

    Science.gov (United States)

    Kolchinsky, Artemy; Abi-Haidar, Alaa; Kaur, Jasleen; Hamed, Ahmed Abdeen; Rocha, Luis M

    2010-01-01

    We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.

  9. A Text Steganographic System Based on Word Length Entropy Rate

    Directory of Open Access Journals (Sweden)

    Francis Xavier Kofi Akotoye

    2017-10-01

    Full Text Available The widespread adoption of electronic distribution of material is accompanied by illicit copying and distribution. This is why individuals, businesses and governments have come to think of how to protect their work, prevent such illicit activities and trace the distribution of a document. It is in this context that a lot of attention is being focused on steganography. Implementing steganography in text document is not an easy undertaking considering the fact that text document has very few places in which to embed hidden data. Any minute change introduced to text objects can easily be noticed thus attracting attention from possible hackers. This study investigates the possibility of embedding data in text document by employing the entropy rate of the constituent characters of words not less than four characters long. The scheme was used to embed bits in text according to the alphabetic structure of the words, the respective characters were compared with their neighbouring characters and if the first character was alphabetically lower than the succeeding character according to their ASCII codes, a zero bit was embedded otherwise 1 was embedded after the characters had been transposed. Before embedding, the secret message was encrypted with a secret key to add a layer of security to the secret message to be embedded, and then a pseudorandom number was generated from the word counts of the text which was used to paint the starting point of the embedding process. The embedding capacity of the scheme was relatively high compared with the space encoding and semantic method.

  10. Text mining in the classification of digital documents

    Directory of Open Access Journals (Sweden)

    Marcial Contreras Barrera

    2016-11-01

    Full Text Available Objective: Develop an automated classifier for the classification of bibliographic material by means of the text mining. Methodology: The text mining is used for the development of the classifier, based on a method of type supervised, conformed by two phases; learning and recognition, in the learning phase, the classifier learns patterns across the analysis of bibliographical records, of the classification Z, belonging to library science, information sciences and information resources, recovered from the database LIBRUNAM, in this phase is obtained the classifier capable of recognizing different subclasses (LC. In the recognition phase the classifier is validated and evaluates across classification tests, for this end bibliographical records of the classification Z are taken randomly, classified by a cataloguer and processed by the automated classifier, in order to obtain the precision of the automated classifier. Results: The application of the text mining achieved the development of the automated classifier, through the method classifying documents supervised type. The precision of the classifier was calculated doing the comparison among the assigned topics manually and automated obtaining 75.70% of precision. Conclusions: The application of text mining facilitated the creation of automated classifier, allowing to obtain useful technology for the classification of bibliographical material with the aim of improving and speed up the process of organizing digital documents.

  11. Text document classification

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana

    č. 62 (2005), s. 53-54 ISSN 0926-4981 R&D Projects: GA AV ČR IAA2075302; GA AV ČR KSK1019101; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : document representation * categorization * classification Subject RIV: BD - Theory of Information

  12. RF model of the distribution system as a communication channel, phase 2. Volume 4: Sofware source program and illustrations ASCII database listings

    Science.gov (United States)

    Rustay, R. C.; Gajjar, J. T.; Rankin, R. W.; Wentz, R. C.; Wooding, R.

    1982-01-01

    Listings of source programs and some illustrative examples of various ASCII data base files are presented. The listings are grouped into the following categories: main programs, subroutine programs, illustrative ASCII data base files. Within each category files are listed alphabetically.

  13. Text segmentation in degraded historical document images

    Directory of Open Access Journals (Sweden)

    A.S. Kavitha

    2016-07-01

    Full Text Available Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.

  14. Fast words boundaries localization in text fields for low quality document images

    Science.gov (United States)

    Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry

    2018-04-01

    The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3

  15. Use of speech-to-text technology for documentation by healthcare providers.

    Science.gov (United States)

    Ajami, Sima

    2016-01-01

    Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.

  16. Documents and legal texts

    International Nuclear Information System (INIS)

    2015-01-01

    This section treats of the following Documents and legal texts: 1 - Canada: Nuclear Liability and Compensation Act (An Act respecting civil liability and compensation for damage in case of a nuclear incident, repealing the Nuclear Liability Act and making consequential amendments to other acts); 2 - Japan: Act on Compensation for Nuclear Damage (The purpose of this act is to protect persons suffering from nuclear damage and to contribute to the sound development of the nuclear industry by establishing a basic system regarding compensation in case of nuclear damage caused by reactor operation etc.); Act on Indemnity Agreements for Compensation of Nuclear Damage; 3 - Slovak Republic: Act on Civil Liability for Nuclear Damage and on its Financial Coverage and on Changes and Amendments to Certain Laws (This Act regulates: a) The civil liability for nuclear damage incurred in the causation of a nuclear incident, b) The scope of powers of the Nuclear Regulatory Authority (hereinafter only as the 'Authority') in relation to the application of this Act, c) The competence of the National Bank of Slovakia in relation to the supervised financial market entities in the financial coverage of liability for nuclear damage; and d) The penalties for violation of this Act)

  17. Means of storage and automated monitoring of versions of text technical documentation

    Science.gov (United States)

    Leonovets, S. A.; Shukalov, A. V.; Zharinov, I. O.

    2018-03-01

    The paper presents automation of the process of preparation, storage and monitoring of version control of a text designer, and program documentation by means of the specialized software is considered. Automation of preparation of documentation is based on processing of the engineering data which are contained in the specifications and technical documentation or in the specification. Data handling assumes existence of strictly structured electronic documents prepared in widespread formats according to templates on the basis of industry standards and generation by an automated method of the program or designer text document. Further life cycle of the document and engineering data entering it are controlled. At each stage of life cycle, archive data storage is carried out. Studies of high-speed performance of use of different widespread document formats in case of automated monitoring and storage are given. The new developed software and the work benches available to the developer of the instrumental equipment are described.

  18. "What is relevant in a text document?": An interpretable machine learning approach.

    Directory of Open Access Journals (Sweden)

    Leila Arras

    Full Text Available Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP, a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

  19. CERCLIS (Superfund) ASCII Text Format - CPAD Database

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Comprehensive Environmental Response, Compensation and Liability Information System (CERCLIS) (Superfund) Public Access Database (CPAD) contains a selected set...

  20. Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation

    OpenAIRE

    R, Amarnath; Nagabhushan, P.

    2017-01-01

    Line separators are used to segregate text-lines from one another in document image analysis. Finding the separator points at every line terminal in a document image would enable text-line segmentation. In particular, identifying the separators in handwritten text could be a thrilling exercise. Obviously it would be challenging to perform this in the compressed version of a document image and that is the proposed objective in this research. Such an effort would prevent the computational burde...

  1. Finding Text Information in the Ocean of Electronic Documents

    Energy Technology Data Exchange (ETDEWEB)

    Medvick, Patricia A.; Calapristi, Augustin J.

    2003-02-05

    Information management in natural resources has become an overwhelming task. A massive amount of electronic documents and data is now available for creating informed decisions. The problem is finding the relevant information to support the decision-making process. Determining gaps in knowledge in order to propose new studies or to determine which proposals to fund for maximum potential is a time-consuming and difficult task. Additionally, available data stores are increasing in complexity; they now may include not only text and numerical data, but also images, sounds, and video recordings. Information visualization specialists at Pacific Northwest National Laboratory (PNNL) have software tools for exploring electronic data stores and for discovering and exploiting relationships within data sets. These provide capabilities for unstructured text explorations, the use of data signatures (a compact format for the essence of a set of scientific data) for visualization (Wong et al 2000), visualizations for multiple query results (Havre et al. 2001), and others (http://www.pnl.gov/infoviz ). We will focus on IN-SPIRE, a MS Windows vision of PNNL’s SPIRE (Spatial Paradigm for Information Retrieval and Exploration). IN-SPIRE was developed to assist information analysts find and discover information in huge masses of text documents.

  2. Documents and legal texts

    International Nuclear Information System (INIS)

    2014-01-01

    This section of the Bulletin presents the recently published documents and legal texts sorted by country: - Brazil: Resolution No. 169 of 30 April 2014. - Japan: Act Concerning Exceptions to Interruption of Prescription Pertaining to Use of Settlement Mediation Procedures by the Dispute Reconciliation Committee for Nuclear Damage Compensation in relation to Nuclear Damage Compensation Disputes Pertaining to the Great East Japan Earthquake (Act No. 32 of 5 June 2013); Act Concerning Measures to Achieve Prompt and Assured Compensation for Nuclear Damage Arising from the Nuclear Plant Accident following the Great East Japan Earthquake and Exceptions to the Extinctive Prescription, etc. of the Right to Claim Compensation for Nuclear Damage (Act No. 97 of 11 December 2013); Fourth Supplement to Interim Guidelines on Determination of the Scope of Nuclear Damage Resulting from the Accident at the Tokyo Electric Power Company Fukushima Daiichi and Daini Nuclear Power Plants (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.); Outline of 'Fourth Supplement to Interim Guidelines (Concerning Damages Associated with the Prolongation of Evacuation Orders, etc.)'. - OECD Nuclear Energy Agency: Decision and Recommendation of the Steering Committee Concerning the Application of the Paris Convention to Nuclear Installations in the Process of Being Decommissioned; Joint Declaration on the Security of Supply of Medical Radioisotopes. - United Arab Emirates: Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage; Ratification of the Federal Supreme Council of Federal Decree No. (51) of 2014 Ratifying the Convention on Supplementary Compensation for Nuclear Damage

  3. CRED 20 m Gridded bathymetry of Brooks Banks and St. Rogatien Bank, Hawaii, USA (Arc ASCII format)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Gridded bathymetry (20m) of the shelf and slope environments of Brooks Banks and St. Rogatien, Hawaii, USA. The ASCII includes multibeam bathymetry from the Simrad...

  4. 36 CFR Appendix to Part 1193 - Advisory Guidance

    Science.gov (United States)

    2010-07-01

    ... electronic form. Electronic text must be provided in ASCII or a properly formatted word processor file. Using... text. When converting a document into ASCII or word processor formats, it is important to utilize the... access, use 18 point type; anything larger could make text too choppy to read comfortably. Use a good...

  5. School Survey on Crime and Safety (SSOCS) 2000 Public-Use Data Files, User's Manual, and Detailed Data Documentation. [CD-ROM].

    Science.gov (United States)

    National Center for Education Statistics (ED), Washington, DC.

    This CD-ROM contains the raw, public-use data from the 2000 School Survey on Crime and Safety (SSOCS) along with a User's Manual and Detailed Data Documentation. The data are provided in SAS, SPSS, STATA, and ASCII formats. The User's Manual and the Detailed Data Documentation are provided as .pdf files. (Author)

  6. Invariant practical tasks for work with text documents at the secondary school

    Directory of Open Access Journals (Sweden)

    Л И Карташова

    2013-12-01

    Full Text Available In article examples of practical tasks on creation, editing and formatting of text documents focused on pupils of the secondary school are given. Tasks have invariant character and don't depend on concrete software.

  7. Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform

    Directory of Open Access Journals (Sweden)

    Abdelghani Souhar

    2017-12-01

    Full Text Available A crucial task in character recognition systems is the segmentation of the document into text lines and especially if it is handwritten. When dealing with non-Latin document such as Arabic, the challenge becomes greater since in addition to the variability of writing, the presence of diacritical points and the high number of ascender and descender characters complicates more the process of the segmentation. To remedy with this complexity and even to make this difficulty an advantage since the focus is on the Arabic language which is semi-cursive in nature, a method based on the Watershed Transform technique is proposed. Tested on «Handwritten Arabic Proximity Datasets» a segmentation rate of 93% for a 95% of matching score is achieved.

  8. SYSTEM «PlagiarismControl» AS THE TOOL FOR THE EXPERTISE OF THE TEXT DOCUMENTS

    Directory of Open Access Journals (Sweden)

    Yu. B. Krapivin

    2018-01-01

    Full Text Available The description and the operability analysis of the implemented instrumental software system «PlagiarismControl» has been done. The system affords to automatize solving the task of the identification of the adopted fragments in the given text document both from the local full-text user’s database and from the Internet. The system affords solving the task taking in account explicit as well as implicit adoptions with precision up to lexical units paradigms and both lexical and grammatical synonymy relations, according to the structural-functional schematic diagram of the system of the automatic recognition of reproduced fragments of the text documents. «PlagiarismControl» is able to work in different modes, to automatize the work of the expert and to speed up significantly the procedure of the analysis of the documents, with the purpose of recognition of the adoptions (plagiarism from other text documents.

  9. Human Rights Texts: Converting Human Rights Primary Source Documents into Data.

    Science.gov (United States)

    Fariss, Christopher J; Linder, Fridolin J; Jones, Zachary M; Crabtree, Charles D; Biek, Megan A; Ross, Ana-Sophia M; Kaur, Taranamol; Tsai, Michael

    2015-01-01

    We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.

  10. Documenting the Earliest Chinese Journals

    Directory of Open Access Journals (Sweden)

    Jian-zhong (Joe Zhou

    2001-10-01

    Full Text Available

    頁次:19-24

    text-indent: 24pt; mso-layout-grid-align: none; mso-char-indent-count: 2.0;">According to various authoritative sources, the English word "journal" was first used in the 16lh century, but the existence of the journal in its original meaning as a daily record can be traced back to Acta Diuma (Daily Events in ancient Roman cities as early as 59 B.C. This article documents the first appearance of Chinese daily records that were much early than 59 B.C.

    text-indent: 24pt; mso-layout-grid-align: none; mso-char-indent-count: 2.0;">The evidence of the earlier Chinese daily records came from some important archaeological discoveries in the 1970's, but they were also documented by Sima Qian (145 B.C. - 85 B.C., the grand historian of the Han Dynasty imperial court. Sima's lifetime contribution was the publication of Shi Ji (ascii-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-font-kerning: 0pt; mso-hansi-font-family: 'Times New Roman';">史記 (The Grand Scribe's Records, the Records hereafter. The Records is a book of history of a grand scope. It encompasses all Chinese history from 30lh century B.C. through the end of the second century B.C. in 130 chapters and over 525,000 Chinese

  11. Command Center Library Model Document. Comprehensive Approach to Reusable Defense Software (CARDS)

    Science.gov (United States)

    1992-05-31

    system, and functionality for specifying the layout of the document. 3.7.16.1 FrameMaker FrameMaker is a Commercial Off The Shelf (COTS) component...facilitating WYSIWYG creation of formatted reports with embedded graphics. FrameMaker is an advanced publishing tool that integrates word processing...available for the component FrameMaker : * Product evaluation reports in ASCII and postscript formats • Product assessment on line in model 0 Product

  12. Leveraging Text Content for Management of Construction Project Documents

    Science.gov (United States)

    Alqady, Mohammed

    2012-01-01

    The construction industry is a knowledge intensive industry. Thousands of documents are generated by construction projects. Documents, as information carriers, must be managed effectively to ensure successful project management. The fact that a single project can produce thousands of documents and that a lot of the documents are generated in a…

  13. LOG2MARKUP: State module to transform a Stata text log into a markup document

    DEFF Research Database (Denmark)

    2016-01-01

    log2markup extract parts of the text version from the Stata log command and transform the logfile into a markup based document with the same name, but with extension markup (or otherwise specified in option extension) instead of log. The author usually uses markdown for writing documents. However...

  14. BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

    Science.gov (United States)

    Ferrández, Oscar; South, Brett R; Shen, Shuying; Friedlin, F Jeffrey; Samore, Matthew H; Meystre, Stéphane M

    2013-01-01

    De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.

  15. Documentation is Documentation and Theory is Theory: A Reply to Daniel Avorgbedor's Commentary "Documenting Spoken and Sung Texts of the Dagaaba of West Africa"

    Directory of Open Access Journals (Sweden)

    Manolete Mora

    2007-11-01

    Full Text Available In a response to an article that appeared in Empirical Musicology Review (Bodomo and Mora 2007, Avorgbedor (2007 takes issue with aspects of the paper. In our reply to Avorgbedor’s response we will firstly clarify some issues raised therein and secondly address the issue about the relationship between theory, description and documentation within linguistics and musicology.

  16. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques

    International Nuclear Information System (INIS)

    Braga, Fabiane dos Reis

    2013-01-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  17. ParaText : scalable solutions for processing and searching very large document collections : final LDRD report.

    Energy Technology Data Exchange (ETDEWEB)

    Crossno, Patricia Joyce; Dunlavy, Daniel M.; Stanton, Eric T.; Shead, Timothy M.

    2010-09-01

    This report is a summary of the accomplishments of the 'Scalable Solutions for Processing and Searching Very Large Document Collections' LDRD, which ran from FY08 through FY10. Our goal was to investigate scalable text analysis; specifically, methods for information retrieval and visualization that could scale to extremely large document collections. Towards that end, we designed, implemented, and demonstrated a scalable framework for text analysis - ParaText - as a major project deliverable. Further, we demonstrated the benefits of using visual analysis in text analysis algorithm development, improved performance of heterogeneous ensemble models in data classification problems, and the advantages of information theoretic methods in user analysis and interpretation in cross language information retrieval. The project involved 5 members of the technical staff and 3 summer interns (including one who worked two summers). It resulted in a total of 14 publications, 3 new software libraries (2 open source and 1 internal to Sandia), several new end-user software applications, and over 20 presentations. Several follow-on projects have already begun or will start in FY11, with additional projects currently in proposal.

  18. The Analysis of Heterogeneous Text Documents with the Help of the Computer Program NUD*IST

    Directory of Open Access Journals (Sweden)

    Christine Plaß

    2000-12-01

    Full Text Available On the basis of a current research project we discuss the use of the computer program NUD*IST for the analysis and archiving of qualitative documents. Our project examines the social evaluation of spectacular criminal offenses and we identify, digitize and analyze documents from the entire 20th century. Since public and scientific discourses are examined, the data of the project are extraordinarily heterogeneous: scientific publications, court records, newspaper reports, and administrative documents. We want to show how to transfer general questions into a systematic categorization with the assistance of NUD*IST. Apart from the functions, possibilities and limitations of the application of NUD*IST, concrete work procedures and difficulties encountered are described. URN: urn:nbn:de:0114-fqs0003211

  19. Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document

    Directory of Open Access Journals (Sweden)

    Nurdiansyah Yanuar

    2018-01-01

    Full Text Available Plagiarism occurs when the students have tasks and pursued by the deadline. Plagiarism is considered as the fastest way to accomplish the tasks. This reason makes the author tried to build a plagiarism detection system with Winnowing algorithm as document similarity search algorithm. The documents that being tested are Indonesian journals with extension .doc, .docx, and/or .txt. Similarity calculation process through two stages, the first is the process of making a document fingerprint using Winnowing algorithm and the second is using Jaccard coefficient similarity. In order to develop this system, the author used iterative waterfall model approach. The main objective of this project is to determine the level of plagiarism. It is expected to prevent plagiarism either intentionally or unintentionally before our journal published by displaying the percentage of similarity in the journals that we make.

  20. 76 FR 10405 - Federal Copyright Protection of Sound Recordings Fixed Before February 15, 1972

    Science.gov (United States)

    2011-02-24

    ... file in either the Adobe Portable Document File (PDF) format that contains searchable, accessible text (not an image); Microsoft Word; WordPerfect; Rich Text Format (RTF); or ASCII text file format (not a..., comments may be delivered in hard copy. If hand delivered by a private party, an original [[Page 10406...

  1. On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency.

    Science.gov (United States)

    Ellis, David; And Others

    1994-01-01

    Describes a study in which several different sets of hypertext links are inserted by different people in full-text documents. The degree of similarity between the sets is measured using coefficients and topological indices. As in comparable studies of inter-indexer consistency, the sets of links used by different people showed little similarity.…

  2. Text document classification based on mixture models

    Czech Academy of Sciences Publication Activity Database

    Novovičová, Jana; Malík, Antonín

    2004-01-01

    Roč. 40, č. 3 (2004), s. 293-304 ISSN 0023-5954 R&D Projects: GA AV ČR IAA2075302; GA ČR GA102/03/0049; GA AV ČR KSK1019101 Institutional research plan: CEZ:AV0Z1075907 Keywords : text classification * text categorization * multinomial mixture model Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.224, year: 2004

  3. Chyawanprash: A review of therapeutic benefits as in authoritative texts and documented clinical literature.

    Science.gov (United States)

    Narayana, D B Anantha; Durg, Sharanbasappa; Manohar, P Ram; Mahapatra, Anita; Aramya, A R

    2017-02-02

    Chyawanprash (CP), a traditional immune booster recipe, has a long history of ethnic origin, development, household preparation and usage. There are even mythological stories about the origin of this recipe including its nomenclature. In the last six decades, CP, because of entrepreneurial actions of some research Vaidyas (traditional doctors) has grown to industrial production and marketing in packed forms to a large number of consumers/patients like any food or health care product. Currently, CP has acquired a large accepted user base in India and in a few countries out-side India. Authoritative texts, recognized by the Drugs and Cosmetics Act of India, describe CP as an immunity enhancer and strength giver meant for improving lung functions in diseases with compromised immunity. This review focuses on published clinical efficacy and safety studies of CP for correlation with health benefits as documented in the authoritative texts, and also briefs on its recipes and processes. Authoritative texts were searched for recipes, processes, and other technical details of CP. Labels of marketing CP products (Indian) were studied for the health claims. Electronic search for studies of CP on efficacy and safety data were performed in PubMed/MEDLINE and DHARA (Digital Helpline for Ayurveda Research Articles), and Ayurvedic books were also searched for clinical studies. The documented clinical studies from electronic databases and Ayurvedic books evidenced that individuals who consume CP regularly for a definite period of time showed improvement in overall health status and immunity. However, most of the clinical studies in this review are of smaller sample size and short duration. Further, limitation to access and review significant data on traditional products like CP in electronic databases was noted. Randomized controlled trials of high quality with larger sample size and longer follow-up are needed to have significant evidence on the clinical use of CP as immunity

  4. Mining the Text: 34 Text Features that Can Ease or Obstruct Text Comprehension and Use

    Science.gov (United States)

    White, Sheida

    2012-01-01

    This article presents 34 characteristics of texts and tasks ("text features") that can make continuous (prose), noncontinuous (document), and quantitative texts easier or more difficult for adolescents and adults to comprehend and use. The text features were identified by examining the assessment tasks and associated texts in the national…

  5. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

    Directory of Open Access Journals (Sweden)

    Hamish Cunningham

    Full Text Available This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

  6. Computing symmetrical strength of N-grams: a two pass filtering approach in automatic classification of text documents.

    Science.gov (United States)

    Agnihotri, Deepak; Verma, Kesari; Tripathi, Priyanka

    2016-01-01

    The contiguous sequences of the terms (N-grams) in the documents are symmetrically distributed among different classes. The symmetrical distribution of the N-Grams raises uncertainty in the belongings of the N-Grams towards the class. In this paper, we focused on the selection of most discriminating N-Grams by reducing the effects of symmetrical distribution. In this context, a new text feature selection method named as the symmetrical strength of the N-Grams (SSNG) is proposed using a two pass filtering based feature selection (TPF) approach. Initially, in the first pass of the TPF, the SSNG method chooses various informative N-Grams from the entire extracted N-Grams of the corpus. Subsequently, in the second pass the well-known Chi Square (χ(2)) method is being used to select few most informative N-Grams. Further, to classify the documents the two standard classifiers Multinomial Naive Bayes and Linear Support Vector Machine have been applied on the ten standard text data sets. In most of the datasets, the experimental results state the performance and success rate of SSNG method using TPF approach is superior to the state-of-the-art methods viz. Mutual Information, Information Gain, Odds Ratio, Discriminating Feature Selection and χ(2).

  7. Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

    Science.gov (United States)

    Cunningham, Hamish; Tablan, Valentin; Roberts, Angus; Bontcheva, Kalina

    2013-01-01

    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

  8. VisualUrText: A Text Analytics Tool for Unstructured Textual Data

    Science.gov (United States)

    Zainol, Zuraini; Jaymes, Mohd T. H.; Nohuddin, Puteri N. E.

    2018-05-01

    The growing amount of unstructured text over Internet is tremendous. Text repositories come from Web 2.0, business intelligence and social networking applications. It is also believed that 80-90% of future growth data is available in the form of unstructured text databases that may potentially contain interesting patterns and trends. Text Mining is well known technique for discovering interesting patterns and trends which are non-trivial knowledge from massive unstructured text data. Text Mining covers multidisciplinary fields involving information retrieval (IR), text analysis, natural language processing (NLP), data mining, machine learning statistics and computational linguistics. This paper discusses the development of text analytics tool that is proficient in extracting, processing, analyzing the unstructured text data and visualizing cleaned text data into multiple forms such as Document Term Matrix (DTM), Frequency Graph, Network Analysis Graph, Word Cloud and Dendogram. This tool, VisualUrText, is developed to assist students and researchers for extracting interesting patterns and trends in document analyses.

  9. RECOVERY OF DOCUMENT TEXT FROM TORN FRAGMENTS USING IMAGE PROCESSING

    OpenAIRE

    C.Prasad; Dr.Mahesh; Dr.S.A.K. Jilani

    2016-01-01

    Recovery of document from its torn or damaged fragments play an important role in the field of forensics and archival study. Reconstruction of the torn papers manually with the help of glue and tapes etc., is tedious, time consuming and not satisfactory. For torn images reconstruction we go for image mosaicing, where we reconstruct the image using features (corners) and RANSAC with homography.But for the torn fragments there is no such similarity portion between fragments. Hence we propose a ...

  10. Content analysis to detect high stress in oral interviews and text documents

    Science.gov (United States)

    Thirumalainambi, Rajkumar (Inventor); Jorgensen, Charles C. (Inventor)

    2012-01-01

    A system of interrogation to estimate whether a subject of interrogation is likely experiencing high stress, emotional volatility and/or internal conflict in the subject's responses to an interviewer's questions. The system applies one or more of four procedures, a first statistical analysis, a second statistical analysis, a third analysis and a heat map analysis, to identify one or more documents containing the subject's responses for which further examination is recommended. Words in the documents are characterized in terms of dimensions representing different classes of emotions and states of mind, in which the subject's responses that manifest high stress, emotional volatility and/or internal conflict are identified. A heat map visually displays the dimensions manifested by the subject's responses in different colors, textures, geometric shapes or other visually distinguishable indicia.

  11. Ultrasound-guided nerve blocks--is documentation and education feasible using only text and pictures?

    Directory of Open Access Journals (Sweden)

    Bjarne Skjødt Worm

    Full Text Available PURPOSE: With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings. MATERIALS AND METHODS: We undertook a clinical study to quantify the maximal gross visibility or ultrastructure of seven peripheral nerves identified by either real time ultrasound (clinical cohort, n = 635 or by still picture ultrasonograms (clinical cohort, n = 112. In addition, we undertook a study on test subjects (n = 4 to quantify interobserver variations and potential bias among expert and trainee observers. RESULTS: When comparing real time ultrasound and interpretation of still picture sonograms, gross identification of large nerves was reduced by 15% and 40% by expert and trainee observers, respectively, while gross identification of small nerves was reduced by 29% and 66%. Identification of within-nerve ultrastructure was even less. For all nerve sizes, trainees were unable to identify any anatomical structure in 24 to 34%, while experts were unable to identify anything in 9 to 10%. CONCLUSION: Exhaustive ultrasonography experience and real time ultrasound measurements seem to be keystones in obtaining optimal nerve identification. In contrast the use of still pictures appears to be insufficient for documentation as well as educational purposes. Alternatives such as video clips or enhanced picture technology are encouraged

  12. Information Gain Based Dimensionality Selection for Classifying Text Documents

    Energy Technology Data Exchange (ETDEWEB)

    Dumidu Wijayasekara; Milos Manic; Miles McQueen

    2013-06-01

    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.

  13. The Balinese Unicode Text Processing

    Directory of Open Access Journals (Sweden)

    Imam Habibi

    2009-06-01

    Full Text Available In principal, the computer only recognizes numbers as the representation of a character. Therefore, there are many encoding systems to allocate these numbers although not all characters are covered. In Europe, every single language even needs more than one encoding system. Hence, a new encoding system known as Unicode has been established to overcome this problem. Unicode provides unique id for each different characters which does not depend on platform, program, and language. Unicode standard has been applied in a number of industries, such as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, and Unisys. In addition, language standards and modern information exchanges such as XML, Java, ECMA Script (JavaScript, LDAP, CORBA 3.0, and WML make use of Unicode as an official tool for implementing ISO/IEC 10646. There are four things to do according to Balinese script: the algorithm of transliteration, searching, sorting, and word boundary analysis (spell checking. To verify the truth of algorithm, some applications are made. These applications can run on Linux/Windows OS platform using J2SDK 1.5 and J2ME WTK2 library. The input and output of the algorithm/application are character sequence that is obtained from keyboard punch and external file. This research produces a module or a library which is able to process the Balinese text based on Unicode standard. The output of this research is the ability, skill, and mastering of 1. Unicode standard (21-bit as a substitution to ASCII (7-bit and ISO8859-1 (8-bit as the former default character set in many applications. 2. The Balinese Unicode text processing algorithm. 3. An experience of working with and learning from an international team that consists of the foremost experts in the area: Michael Everson (Ireland, Peter Constable (Microsoft US, I Made Suatjana, and Ida Bagus Adi Sudewa.

  14. Multilingual access to full text databases; Acces multilingue aux bases de donnees en texte integral

    Energy Technology Data Exchange (ETDEWEB)

    Fluhr, C; Radwan, K [Institut National des Sciences et Techniques Nucleaires (INSTN), Centre d` Etudes de Saclay, 91 - Gif-sur-Yvette (France)

    1990-05-01

    Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs.

  15. Guidelines on Active Content and Mobile Code: Recommendations of the National Institute of Standards and Technology

    National Research Council Canada - National Science Library

    Jansen, Wayne

    2001-01-01

    .... One such category of technologies is active content. Broadly speaking, active content refers to electronic documents that, unlike past character documents based on the American Standard Code for Information Interchange (ASCII...

  16. The LawsAndFamilies questionnaire on legal family formats for same-sex and/or different-sex couples : Text of the questions and of the accompanying guidance document.

    NARCIS (Netherlands)

    Waaldijk, C.; Lorenzo, Villaverde J.M.; Nikolina, N.; Zago, G.

    2016-01-01

    This Working Paper of the research project FamiliesAndSocieties contains the text of the LawsAndFamilies questionnaire, plus the text of the guidance document provided to legal experts answering this questionnaire. These texts are preceded by a brief introduction to the background, aims and

  17. Integration of HTML documents into an XML-based knowledge repository.

    Science.gov (United States)

    Roemer, Lorrie K; Rocha, Roberto A; Del Fiol, Guilherme

    2005-01-01

    The Emergency Patient Instruction Generator (EPIG) is an electronic content compiler / viewer / editor developed by Intermountain Health Care. The content is vendor-licensed HTML patient discharge instructions. This work describes the process by which discharge instructions where converted from ASCII-encoded HTML to XML, then loaded to a database for use by EPIG.

  18. Imaginary Documentary: reflecting upon contemporary documental photography Documentário Imaginário: reflexões sobre a fotografia documental contemporânea

    Directory of Open Access Journals (Sweden)

    Kátia Hallak Lombardi

    2008-01-01

    Full Text Available This article pursues the idea of an Imaginary Documentary – a possible new inflexion on the practices of contemporary documental photography. The text establishes its theoretical foundations through a forthcoming approach of the discussions about documental photography to the concept of imaginary, by Gilbert Durand, and the notion of Imaginary Museum, by André Malraux. Photographers that are part of documental photography history are the elected objects in which we shall confront the potentialities of the Imaginary Documentary. Este artigo tem como propósito buscar a estruturação da idéia de Documentário Imaginário – uma possível inflexão na prática da fotografia documental contemporânea. O texto assenta suas bases teóricas por meio da aproximação de reflexões sobre a fotografia documental ao conceito de imaginário em Gilbert Durand e à noção de Museu Imaginário de André Malraux. Fotógrafos que fazem parte da história da fotografia documental são os objetos eleitos para aferir as potencialidades do Documentário Imaginário.

  19. Visualizing the semantic content of large text databases using text maps

    Science.gov (United States)

    Combs, Nathan

    1993-01-01

    A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.

  20. Investigation into Text Classification With Kernel Based Schemes

    Science.gov (United States)

    2010-03-01

    Document Matrix TDMs Term-Document Matrices TMG Text to Matrix Generator TN True Negative TP True Positive VSM Vector Space Model xxii THIS PAGE...are represented as a term-document matrix, common evaluation metrics, and the software package Text to Matrix Generator ( TMG ). The classifier...AND METRICS This chapter introduces the indexing capabilities of the Text to Matrix Generator ( TMG ) Toolbox. Specific attention is placed on the

  1. “Dreamers Often Lie”: On “Compromise”, the subversive documentation of an Israeli- Palestinian political adaptation of Shakespeare’s Romeo and Juliet

    Directory of Open Access Journals (Sweden)

    Yael Munk

    2010-03-01

    Full Text Available

    Is Romeo and Juliet relevant to a description of the Middle-East conflict? This is the question raised in Compromise, an Israeli documentary that follows the

  2. Modeling statistical properties of written text.

    Directory of Open Access Journals (Sweden)

    M Angeles Serrano

    Full Text Available Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics.

  3. Basic Test Framework for the Evaluation of Text Line Segmentation and Text Parameter Extraction

    Directory of Open Access Journals (Sweden)

    Darko Brodić

    2010-05-01

    Full Text Available Text line segmentation is an essential stage in off-line optical character recognition (OCR systems. It is a key because inaccurately segmented text lines will lead to OCR failure. Text line segmentation of handwritten documents is a complex and diverse problem, complicated by the nature of handwriting. Hence, text line segmentation is a leading challenge in handwritten document image processing. Due to inconsistencies in measurement and evaluation of text segmentation algorithm quality, some basic set of measurement methods is required. Currently, there is no commonly accepted one and all algorithm evaluation is custom oriented. In this paper, a basic test framework for the evaluation of text feature extraction algorithms is proposed. This test framework consists of a few experiments primarily linked to text line segmentation, skew rate and reference text line evaluation. Although they are mutually independent, the results obtained are strongly cross linked. In the end, its suitability for different types of letters and languages as well as its adaptability are its main advantages. Thus, the paper presents an efficient evaluation method for text analysis algorithms.

  4. Multilingual access to full text databases

    International Nuclear Information System (INIS)

    Fluhr, C.; Radwan, K.

    1990-05-01

    Many full text databases are available in only one language, or more, they may contain documents in different languages. Even if the user is able to understand the language of the documents in the database, it could be easier for him to express his need in his own language. For the case of databases containing documents in different languages, it is more simple to formulate the query in one language only and to retrieve documents in different languages. This paper present the developments and the first experiments of multilingual search, applied to french-english pair, for text data in nuclear field, based on the system SPIRIT. After reminding the general problems of full text databases search by queries formulated in natural language, we present the methods used to reformulate the queries and show how they can be expanded for multilingual search. The first results on data in nuclear field are presented (AFCEN norms and INIS abstracts). 4 refs

  5. Text mining for the biocuration workflow.

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A P C; Krallinger, Martin; Arighi, Cecilia; Cohen, K Bretonnel; Valencia, Alfonso; Wu, Cathy H; Chatr-Aryamontri, Andrew; Dowell, Karen G; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

  6. Text mining for the biocuration workflow

    Science.gov (United States)

    Hirschman, Lynette; Burns, Gully A. P. C; Krallinger, Martin; Arighi, Cecilia; Cohen, K. Bretonnel; Valencia, Alfonso; Wu, Cathy H.; Chatr-Aryamontri, Andrew; Dowell, Karen G.; Huala, Eva; Lourenço, Anália; Nash, Robert; Veuthey, Anne-Lise; Wiegers, Thomas; Winter, Andrew G.

    2012-01-01

    Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community. PMID:22513129

  7. Segmentation of complex document

    Directory of Open Access Journals (Sweden)

    Souad Oudjemia

    2014-06-01

    Full Text Available In this paper we present a method for segmentation of documents image with complex structure. This technique based on GLCM (Grey Level Co-occurrence Matrix used to segment this type of document in three regions namely, 'graphics', 'background' and 'text'. Very briefly, this method is to divide the document image, in block size chosen after a series of tests and then applying the co-occurrence matrix to each block in order to extract five textural parameters which are energy, entropy, the sum entropy, difference entropy and standard deviation. These parameters are then used to classify the image into three regions using the k-means algorithm; the last step of segmentation is obtained by grouping connected pixels. Two performance measurements are performed for both graphics and text zones; we have obtained a classification rate of 98.3% and a Misclassification rate of 1.79%.

  8. Text Skimming: The Process and Effectiveness of Foraging through Text under Time Pressure

    Science.gov (United States)

    Duggan, Geoffrey B.; Payne, Stephen J.

    2009-01-01

    Is Skim reading effective? How do readers allocate their attention selectively? The authors report 3 experiments that use expository texts and allow readers only enough time to read half of each document. Experiment 1 found that, relative to reading half the text, skimming improved memory for important ideas from a text but did not improve memory…

  9. Indian Language Document Analysis and Understanding

    Indian Academy of Sciences (India)

    documents would contain text of more than one script (for example, English, Hindi and the ... O'Gorman and Govindaraju provides a good overview on document image ... word level in bilingual documents containing Roman and Tamil scripts.

  10. The Pelindaba text and its previous

    International Nuclear Information System (INIS)

    Adeniji, O.

    1996-01-01

    The main body of the Treaty, the preamble, articles 1-22, and the map are reproduced in this issue in the section ''Documentation Relating to Disarmament and International Security''. The complete text, including annexes and protocols, is contained in document A/50/426

  11. Testing System Encryption-Decryption Method to RSA Security Documents

    International Nuclear Information System (INIS)

    Supriyono

    2008-01-01

    A model of document protection which was tested as one of the instruments, especially text document. The principle of the document protection was how the system was able to protect the document storage and transfer processes. Firstly, the text-formed document was encrypted; therefore, the document cannot be read for the text was transformed into random letters. The letter-randomized text was then unfolded by the description in order that the document owner was able to read it. In the recent research, the method adopted was RSA method, in which it used complicated mathematics calculation and equipped with initial protection key (with either private key or public key), thus, it was more difficult to be attacked by hackers. The system was developed by using the software of Borland Delphi 7. The results indicated that the system was capable to save and transfer the document, both via internet and intranet in the form of encrypted letter and put it back to the initial form of document by way of description. The research also tested for encrypted and decrypted process for various memory size documents. (author)

  12. Combining position weight matrices and document-term matrix for efficient extraction of associations of methylated genes and diseases from free text.

    Directory of Open Access Journals (Sweden)

    Arwa Bin Raies

    Full Text Available BACKGROUND: In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. METHODOLOGY: We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. CONCLUSION: The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download.

  13. Securing XML Documents

    Directory of Open Access Journals (Sweden)

    Charles Shoniregun

    2004-11-01

    Full Text Available XML (extensible markup language is becoming the current standard for establishing interoperability on the Web. XML data are self-descriptive and syntax-extensible; this makes it very suitable for representation and exchange of semi-structured data, and allows users to define new elements for their specific applications. As a result, the number of documents incorporating this standard is continuously increasing over the Web. The processing of XML documents may require a traversal of all document structure and therefore, the cost could be very high. A strong demand for a means of efficient and effective XML processing has posed a new challenge for the database world. This paper discusses a fast and efficient indexing technique for XML documents, and introduces the XML graph numbering scheme. It can be used for indexing and securing graph structure of XML documents. This technique provides an efficient method to speed up XML data processing. Furthermore, the paper explores the classification of existing methods impact of query processing, and indexing.

  14. A document processing pipeline for annotating chemical entities in scientific documents.

    Science.gov (United States)

    Campos, David; Matos, Sérgio; Oliveira, José L

    2015-01-01

    The recognition of drugs and chemical entities in text is a very important task within the field of biomedical information extraction, given the rapid growth in the amount of published texts (scientific papers, patents, patient records) and the relevance of these and other related concepts. If done effectively, this could allow exploiting such textual resources to automatically extract or infer relevant information, such as drug profiles, relations and similarities between drugs, or associations between drugs and potential drug targets. The objective of this work was to develop and validate a document processing and information extraction pipeline for the identification of chemical entity mentions in text. We used the BioCreative IV CHEMDNER task data to train and evaluate a machine-learning based entity recognition system. Using a combination of two conditional random field models, a selected set of features, and a post-processing stage, we achieved F-measure results of 87.48% in the chemical entity mention recognition task and 87.75% in the chemical document indexing task. We present a machine learning-based solution for automatic recognition of chemical and drug names in scientific documents. The proposed approach applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Post-processing modules are also integrated, performing parentheses correction, abbreviation resolution and filtering erroneous mentions using an exclusion list derived from the training data. The developed methods were implemented as a document annotation tool and web service, freely available at http://bioinformatics.ua.pt/becas-chemicals/.

  15. Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

    KAUST Repository

    Bin Raies, Arwa

    2013-10-16

    Background:In a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually.Methodology:We developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text.Conclusion:The new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at http://www.cbrc.kaust.edu.sa/demgd/. The data is available for online browsing and download. © 2013 Bin Raies et al.

  16. Let Documents Talk to Each Other: A Computer Model for Connection of Short Documents.

    Science.gov (United States)

    Chen, Z.

    1993-01-01

    Discusses the integration of scientific texts through the connection of documents and describes a computer model that can connect short documents. Information retrieval and artificial intelligence are discussed; a prototype system of the model is explained; and the model is compared to other computer models. (17 references) (LRW)

  17. Proxima: a presentation-oriented editor for structured documents

    OpenAIRE

    Schrage, M.M.

    2004-01-01

    A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences between editors, the core editing behavior, whether performed in a word-processor or a spreadsheet, is largely similar: document fragments may be copied and pasted, and new parts of the document may be ...

  18. Cultural diversity: blind spot in medical curriculum documents, a document analysis.

    Science.gov (United States)

    Paternotte, Emma; Fokkema, Joanne P I; van Loon, Karsten A; van Dulmen, Sandra; Scheele, Fedde

    2014-08-22

    Cultural diversity among patients presents specific challenges to physicians. Therefore, cultural diversity training is needed in medical education. In cases where strategic curriculum documents form the basis of medical training it is expected that the topic of cultural diversity is included in these documents, especially if these have been recently updated. The aim of this study was to assess the current formal status of cultural diversity training in the Netherlands, which is a multi-ethnic country with recently updated medical curriculum documents. In February and March 2013, a document analysis was performed of strategic curriculum documents for undergraduate and postgraduate medical education in the Netherlands. All text phrases that referred to cultural diversity were extracted from these documents. Subsequently, these phrases were sorted into objectives, training methods or evaluation tools to assess how they contributed to adequate curriculum design. Of a total of 52 documents, 33 documents contained phrases with information about cultural diversity training. Cultural diversity aspects were more prominently described in the curriculum documents for undergraduate education than in those for postgraduate education. The most specific information about cultural diversity was found in the blueprint for undergraduate medical education. In the postgraduate curriculum documents, attention to cultural diversity differed among specialties and was mainly superficial. Cultural diversity is an underrepresented topic in the Dutch documents that form the basis for actual medical training, although the documents have been updated recently. Attention to the topic is thus unwarranted. This situation does not fit the demand of a multi-ethnic society for doctors with cultural diversity competences. Multi-ethnic countries should be critical on the content of the bases for their medical educational curricula.

  19. Working with text tools, techniques and approaches for text mining

    CERN Document Server

    Tourte, Gregory J L

    2016-01-01

    Text mining tools and technologies have long been a part of the repository world, where they have been applied to a variety of purposes, from pragmatic aims to support tools. Research areas as diverse as biology, chemistry, sociology and criminology have seen effective use made of text mining technologies. Working With Text collects a subset of the best contributions from the 'Working with text: Tools, techniques and approaches for text mining' workshop, alongside contributions from experts in the area. Text mining tools and technologies in support of academic research include supporting research on the basis of a large body of documents, facilitating access to and reuse of extant work, and bridging between the formal academic world and areas such as traditional and social media. Jisc have funded a number of projects, including NaCTem (the National Centre for Text Mining) and the ResDis programme. Contents are developed from workshop submissions and invited contributions, including: Legal considerations in te...

  20. Are PDF Documents Accessible?

    Directory of Open Access Journals (Sweden)

    Mireia Ribera Turró

    2008-09-01

    Full Text Available Adobe PDF is one of the most widely used formats in scientific communications and in administrative documents. In its latest versions it has incorporated structural tags and improvements that increase its level of accessibility. This article reviews the concept of accessibility in the reading of digital documents and evaluates the accessibility of PDF according to the most widely established standards.

  1. A Chinese text classification system based on Naive Bayes algorithm

    Directory of Open Access Journals (Sweden)

    Cui Wei

    2016-01-01

    Full Text Available In this paper, aiming at the characteristics of Chinese text classification, using the ICTCLAS(Chinese lexical analysis system of Chinese academy of sciences for document segmentation, and for data cleaning and filtering the Stop words, using the information gain and document frequency feature selection algorithm to document feature selection. Based on this, based on the Naive Bayesian algorithm implemented text classifier , and use Chinese corpus of Fudan University has carried on the experiment and analysis on the system.

  2. Helios: Understanding Solar Evolution Through Text Analytics

    Energy Technology Data Exchange (ETDEWEB)

    Randazzese, Lucien [SRI International, Menlo Park, CA (United States)

    2016-12-02

    This proof-of-concept project focused on developing, testing, and validating a range of bibliometric, text analytic, and machine-learning based methods to explore the evolution of three photovoltaic (PV) technologies: Cadmium Telluride (CdTe), Dye-Sensitized solar cells (DSSC), and Multi-junction solar cells. The analytical approach to the work was inspired by previous work by the same team to measure and predict the scientific prominence of terms and entities within specific research domains. The goal was to create tools that could assist domain-knowledgeable analysts in investigating the history and path of technological developments in general, with a focus on analyzing step-function changes in performance, or “breakthroughs,” in particular. The text-analytics platform developed during this project was dubbed Helios. The project relied on computational methods for analyzing large corpora of technical documents. For this project we ingested technical documents from the following sources into Helios: Thomson Scientific Web of Science (papers), the U.S. Patent & Trademark Office (patents), the U.S. Department of Energy (technical documents), the U.S. National Science Foundation (project funding summaries), and a hand curated set of full-text documents from Thomson Scientific and other sources.

  3. Empirical Studies On Machine Learning Based Text Classification Algorithms

    OpenAIRE

    Shweta C. Dharmadhikari; Maya Ingle; Parag Kulkarni

    2011-01-01

    Automatic classification of text documents has become an important research issue now days. Properclassification of text documents requires information retrieval, machine learning and Natural languageprocessing (NLP) techniques. Our aim is to focus on important approaches to automatic textclassification based on machine learning techniques viz. supervised, unsupervised and semi supervised.In this paper we present a review of various text classification approaches under machine learningparadig...

  4. Text Mining the History of Medicine.

    Science.gov (United States)

    Thompson, Paul; Batista-Navarro, Riza Theresa; Kontonatsios, Georgios; Carter, Jacob; Toon, Elizabeth; McNaught, John; Timmermann, Carsten; Worboys, Michael; Ananiadou, Sophia

    2016-01-01

    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while

  5. Proxima: a presentation-oriented editor for structured documents

    NARCIS (Netherlands)

    Schrage, M.M.

    2004-01-01

    A typical computer user deals with a large variety of documents, such as text files, spreadsheets, and web pages. The applications for constructing and modifying these documents are called editors (e.g. text editors, spreadsheet applications, and HTML editors). Despite the apparent differences

  6. Tank waste remediation system functions and requirements document

    Energy Technology Data Exchange (ETDEWEB)

    Carpenter, K.E

    1996-10-03

    This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle.

  7. Tank waste remediation system functions and requirements document

    International Nuclear Information System (INIS)

    Carpenter, K.E

    1996-01-01

    This is the Tank Waste Remediation System (TWRS) Functions and Requirements Document derived from the TWRS Technical Baseline. The document consists of several text sections that provide the purpose, scope, background information, and an explanation of how this document assists the application of Systems Engineering to the TWRS. The primary functions identified in the TWRS Functions and Requirements Document are identified in Figure 4.1 (Section 4.0) Currently, this document is part of the overall effort to develop the TWRS Functional Requirements Baseline, and contains the functions and requirements needed to properly define the top three TWRS function levels. TWRS Technical Baseline information (RDD-100 database) included in the appendices of the attached document contain the TWRS functions, requirements, and architecture necessary to define the TWRS Functional Requirements Baseline. Document organization and user directions are provided in the introductory text. This document will continue to be modified during the TWRS life-cycle

  8. Applications for electronic documents

    International Nuclear Information System (INIS)

    Beitel, G.A.

    1995-01-01

    This paper discusses the application of electronic media to documents, specifically Safety Analysis Reports (SARs), prepared for Environmental Restoration and Waste Management (ER ampersand WM) programs being conducted for the Department of Energy (DOE) at the Idaho National Engineering Laboratory (INEL). Efforts are underway to upgrade our document system using electronic format. To satisfy external requirements (DOE, State, and Federal), ER ampersand WM programs generate a complement of internal requirements documents including a SAR and Technical Safety Requirements along with procedures and training materials. Of interest, is the volume of information and the difficulty in handling it. A recently prepared ER ampersand WM SAR consists of 1,000 pages of text and graphics; supporting references add 10,000 pages. Other programmatic requirements documents consist of an estimated 5,000 pages plus references

  9. SEMANTIC METADATA FOR HETEROGENEOUS SPATIAL PLANNING DOCUMENTS

    Directory of Open Access Journals (Sweden)

    A. Iwaniak

    2016-09-01

    Full Text Available Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa. The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  10. Document reconstruction by layout analysis of snippets

    Science.gov (United States)

    Kleber, Florian; Diem, Markus; Sablatnig, Robert

    2010-02-01

    Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. Also skew detection of scanned documents is performed to support OCR algorithms that are sensitive to skew. In this paper document analysis is applied to snippets of torn documents to calculate features for the reconstruction. Documents can either be destroyed by the intention to make the printed content unavailable (e.g. tax fraud investigation, business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, inpainting and texture synthesis techniques. In this paper the possibility of document analysis techniques of snippets to support the matching algorithm by considering additional features are shown. This implies a rotational analysis, a color analysis and a line detection. As a future work it is planned to extend the feature set with the paper type (blank, checked, lined), the type of the writing (handwritten vs. machine printed) and the text layout of a snippet (text size, line spacing). Preliminary results show that these pre-processing steps can be performed reliably on a real dataset consisting of 690 snippets.

  11. Using ontology network structure in text mining.

    Science.gov (United States)

    Berndt, Donald J; McCart, James A; Luther, Stephen L

    2010-11-13

    Statistical text mining treats documents as bags of words, with a focus on term frequencies within documents and across document collections. Unlike natural language processing (NLP) techniques that rely on an engineered vocabulary or a full-featured ontology, statistical approaches do not make use of domain-specific knowledge. The freedom from biases can be an advantage, but at the cost of ignoring potentially valuable knowledge. The approach proposed here investigates a hybrid strategy based on computing graph measures of term importance over an entire ontology and injecting the measures into the statistical text mining process. As a starting point, we adapt existing search engine algorithms such as PageRank and HITS to determine term importance within an ontology graph. The graph-theoretic approach is evaluated using a smoking data set from the i2b2 National Center for Biomedical Computing, cast as a simple binary classification task for categorizing smoking-related documents, demonstrating consistent improvements in accuracy.

  12. Growing electronic documents created by researchers

    Directory of Open Access Journals (Sweden)

    Monika Weiss

    2017-05-01

    Full Text Available In the contemporary world technology is an indispensable element, both in personal and professional sphere. Despite the fact, that we do not attach significance to it in our everyday lives, the technological development engulfed us and still reminds us about that. In the face of dynamically growing digitization there occurred a new form of document – an electronic document. The study concerns the growing electronic documentation among researchers working at the Nicolaus Copernicus University in Toruń. The analysis of surveys and interviews resulted in thesis, that researchers use e-document more frequently than analog documentation. Flexibility and accessibility of this type of documents become a problem in personal papers which will be archived in the future – maybe in most part in the form of electronic documentation.

  13. Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

    Directory of Open Access Journals (Sweden)

    M.C. Padma

    2008-06-01

    Full Text Available In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.

  14. FTP: Full-Text Publishing?

    Science.gov (United States)

    Jul, Erik

    1992-01-01

    Describes the use of file transfer protocol (FTP) on the INTERNET computer network and considers its use as an electronic publishing system. The differing electronic formats of text files are discussed; the preparation and access of documents are described; and problems are addressed, including a lack of consistency. (LRW)

  15. Document organization by means of graphs

    Directory of Open Access Journals (Sweden)

    Santa Vallejo Figueroa

    2016-12-01

    Full Text Available Nowadays documents are the main way to represent information and knowledge in several domains. Continuously users store documents in hard disk or online media according to some personal organization based on topics, but such documents can contain one or more topics. This situation makes hard to access documents when is required. The current search engines are based on the name of file or content, but where the desired term or terms must match exactly as are in the content. In this paper, a method for organize documents by means of graphs is proposed, taking into account the topics of the documents. For this a graph for each document is generated taking into account synonyms, semantic related terms, hyponyms, and hypernyms of nouns and verbs contained in documents. The proposal have been compares against Google Desktop and LogicalDoc with interesting results.

  16. A Comparative Analysis of Information Hiding Techniques for Copyright Protection of Text Documents

    Directory of Open Access Journals (Sweden)

    Milad Taleby Ahvanooey

    2018-01-01

    Full Text Available With the ceaseless usage of web and other online services, it has turned out that copying, sharing, and transmitting digital media over the Internet are amazingly simple. Since the text is one of the main available data sources and most widely used digital media on the Internet, the significant part of websites, books, articles, daily papers, and so on is just the plain text. Therefore, copyrights protection of plain texts is still a remaining issue that must be improved in order to provide proof of ownership and obtain the desired accuracy. During the last decade, digital watermarking and steganography techniques have been used as alternatives to prevent tampering, distortion, and media forgery and also to protect both copyright and authentication. This paper presents a comparative analysis of information hiding techniques, especially on those ones which are focused on modifying the structure and content of digital texts. Herein, various text watermarking and text steganography techniques characteristics are highlighted along with their applications. In addition, various types of attacks are described and their effects are analyzed in order to highlight the advantages and weaknesses of current techniques. Finally, some guidelines and directions are suggested for future works.

  17. Robust keyword retrieval method for OCRed text

    Science.gov (United States)

    Fujii, Yusaku; Takebe, Hiroaki; Tanaka, Hiroshi; Hotta, Yoshinobu

    2011-01-01

    Document management systems have become important because of the growing popularity of electronic filing of documents and scanning of books, magazines, manuals, etc., through a scanner or a digital camera, for storage or reading on a PC or an electronic book. Text information acquired by optical character recognition (OCR) is usually added to the electronic documents for document retrieval. Since texts generated by OCR generally include character recognition errors, robust retrieval methods have been introduced to overcome this problem. In this paper, we propose a retrieval method that is robust against both character segmentation and recognition errors. In the proposed method, the insertion of noise characters and dropping of characters in the keyword retrieval enables robustness against character segmentation errors, and character substitution in the keyword of the recognition candidate for each character in OCR or any other character enables robustness against character recognition errors. The recall rate of the proposed method was 15% higher than that of the conventional method. However, the precision rate was 64% lower.

  18. Electronic Document Management Using Inverted Files System

    Directory of Open Access Journals (Sweden)

    Suhartono Derwin

    2014-03-01

    Full Text Available The amount of documents increases so fast. Those documents exist not only in a paper based but also in an electronic based. It can be seen from the data sample taken by the SpringerLink publisher in 2010, which showed an increase in the number of digital document collections from 2003 to mid of 2010. Then, how to manage them well becomes an important need. This paper describes a new method in managing documents called as inverted files system. Related with the electronic based document, the inverted files system will closely used in term of its usage to document so that it can be searched over the Internet using the Search Engine. It can improve document search mechanism and document save mechanism.

  19. A Hybrid Feature Selection Approach for Arabic Documents Classification

    NARCIS (Netherlands)

    Habib, Mena Badieh; Sarhan, Ahmed A. E.; Salem, Abdel-Badeeh M.; Fayed, Zaki T.; Gharib, Tarek F.

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to

  20. The Texts of the Agency's Relationship Agreements with Specialized Agencies; Textes des Accords Conclus Entre l'Agence et des Institutions Specialisees

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1960-09-27

    The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [French] Le present document reproduit les textes des accords que l'Agence a conclus avec les institutions specialisees enumerees ci-apres, ainsi que ceux des protocoles validant lesdits accords. Ces textes sont presentes, pour information, a tous les Membres de l'Agence dans l'ordre chronologique d'entree en vigueur desdits accords.

  1. Academic Journal Embargoes and Full Text Databases.

    Science.gov (United States)

    Brooks, Sam

    2003-01-01

    Documents the reasons for embargoes of academic journals in full text databases (i.e., publisher-imposed delays on the availability of full text content) and provides insight regarding common misconceptions. Tables present data on selected journals covering a cross-section of subjects and publishers and comparing two full text business databases.…

  2. CNEA's quality system documentation

    International Nuclear Information System (INIS)

    Mazzini, M.M.; Garonis, O.H.

    1998-01-01

    Full text: To obtain an effective and coherent documentation system suitable for CNEA's Quality Management Program, we decided to organize the CNEA's quality documentation with : a- Level 1. Quality manual. b- Level 2. Procedures. c-Level 3. Qualities plans. d- Level 4: Instructions. e- Level 5. Records and other documents. The objective of this work is to present a standardization of the documentation of the CNEA's quality system of facilities, laboratories, services, and R and D activities. Considering the diversity of criteria and formats for elaboration the documentation by different departments, and since ultimately each of them generally includes the same quality management policy, we proposed the elaboration of a system in order to improve the documentation, avoiding unnecessary time wasting and costs. This will aloud each sector to focus on their specific documentation. The quality manuals of the atomic centers fulfill the rule 3.6.1 of the Nuclear Regulatory Authority, and the Safety Series 50-C/SG-Q of the International Atomic Energy Agency. They are designed by groups of competent and highly trained people of different departments. The normative procedures are elaborated with the same methodology as the quality manuals. The quality plans which describe the organizational structure of working group and the appropriate documentation, will asses the quality manuals of facilities, laboratories, services, and research and development activities of atomic centers. The responsibilities for approval of the normative documentation are assigned to the management in charge of the administration of economic and human resources in order to fulfill the institutional objectives. Another improvement aimed to eliminate unnecessary invaluable processes is the inclusion of all quality system's normative documentation in the CNEA intranet. (author) [es

  3. Attitudes and emotions through written text: the case of textual deformation in internet chat rooms.

    Directory of Open Access Journals (Sweden)

    Francisco Yus Ramos

    2010-11-01

    Full Text Available Normal 0 21 false false false ES X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Tabla normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Los chats españoles de Internet son visitados por muchos jóvenes que usan el lenguaje de una forma muy creativa (ej. repetición de letras y signos de puntuación. En este artículo se evalúan varias hipótesis sobre el uso de la deformación textual respecto a su eficacia comunicativa. Se trata de comprobar si estas deformaciones favorecen una identificación y evaluación más adecuada de las actitudes (proposicionales o afectivas y emociones de sus autores. Las respuestas a un cuestionario revelan que a pesar de la información adicional que la deformación textual aporta, los lectores no suelen coincidir en la cualidad exacta de estas actitudes y emociones, ni establecen grados de intensidad relacionados con la cantidad de texto tecleada. Sin embargo, y a pesar de estos resultados, la deformación textual parece jugar un papel en la interpretación que finalmente se elige de estos mensajes enviados a los chats.

  4. Representation of Social History Factors Across Age Groups: A Topic Analysis of Free-Text Social Documentation.

    Science.gov (United States)

    Lindemann, Elizabeth A; Chen, Elizabeth S; Wang, Yan; Skube, Steven J; Melton, Genevieve B

    2017-01-01

    As individuals age, there is potential for dramatic changes in the social and behavioral determinants that affect health status and outcomes. The importance of these determinants has been increasingly recognized in clinical decision-making. We sought to characterize how social and behavioral health determinants vary in different demographic groups using a previously established schema of 28 social history types through both manual analysis and automated topic analysis of social documentation in the electronic health record across the population of an entire integrated healthcare system. Our manual analysis generated 8,335 annotations over 1,400 documents, representing 24 (86%) social history types. In contrast, automated topic analysis generated 22 (79%) social history types. A comparative evaluation demonstrated both similarities and differences in coverage between the manual and topic analyses. Our findings validate the widespread nature of social and behavioral determinants that affect health status over populations of individuals over their lifespan.

  5. An Intelligent System For Arabic Text Categorization

    NARCIS (Netherlands)

    Syiam, M.M.; Tolba, Mohamed F.; Fayed, Z.T.; Abdel-Wahab, Mohamed S.; Ghoniemy, Said A.; Habib, Mena Badieh

    Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and

  6. Changing Landscapes in Documentation Efforts: Civil Society Documentation of Serious Human Rights Violations

    Directory of Open Access Journals (Sweden)

    Brianne McGonigle Leyh

    2017-04-01

    Full Text Available Wittingly or unwittingly, civil society actors have long been faced with the task of documenting serious human rights violations. Thirty years ago, such efforts were largely organised by grassroots movements, often with little support or funding from international actors. Sharing information and best practices was difficult. Today that situation has significantly changed. The purpose of this article is to explore the changing landscape of civil society documentation of serious human rights violations, and what that means for standardising and professionalising documentation efforts. Using the recent Hisséne Habré case as an example, this article begins by looking at how civil society documentation can successfully influence an accountability process. Next, the article touches upon barriers that continue to impede greater documentation efforts. The article examines the changing landscape of documentation, focusing on technological changes and the rise of citizen journalism and unofficial investigations, using Syria as an example, as well as on the increasing support for documentation efforts both in Syria and worldwide. The changing landscape has resulted in the proliferation of international documentation initiatives aimed at providing local civil society actors guidelines and practical assistance on how to recognise, collect, manage, store and use information about serious human rights violations, as well as on how to minimise the risks associated with the documentation of human rights violations. The recent initiatives undertaken by international civil society, including those by the Public International Law & Policy Group, play an important role in helping to standardise and professionalise documentation work and promote the foundational principles of documentation, namely the ‘do no harm’ principle, and the principles of informed consent and confidentiality. Recognising the drawback that greater professionalisation may bring, it

  7. Informational system. Documents management

    Directory of Open Access Journals (Sweden)

    Vladut Iacob

    2009-12-01

    Full Text Available Productivity growing, as well as reducing of operational costs in a company can be achieved by adopting a document management solutions. Such application will allow management and structured and efficient transmission of information within the organization.

  8. Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique

    Directory of Open Access Journals (Sweden)

    Barate Seema

    2016-01-01

    Full Text Available Images contain various types of useful information that should be extracted whenever required. A various algorithms and methods are proposed to extract text from the given image, and by using that user will be able to access the text from any image. Variations in text may occur because of differences in size, style,orientation, alignment of text, and low image contrast, composite backgrounds make the problem during extraction of text. If we develop an application that extracts and recognizes those texts accurately in real time, then it can be applied to many important applications like document analysis, vehicle license plate extraction, text- based image indexing, etc and many applications have become realities in recent years. To overcome the above problems we develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model, blob analysis,template matching, template generation.

  9. Data mining of text as a tool in authorship attribution

    Science.gov (United States)

    Visa, Ari J. E.; Toivonen, Jarmo; Autio, Sami; Maekinen, Jarno; Back, Barbro; Vanharanta, Hannu

    2001-03-01

    It is common that text documents are characterized and classified by keywords that the authors use to give them. Visa et al. have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the document database of the monitored document flow. The new methodology is capable of extracting the meaning of the document in a certain degree. Our claim is that the new methodology is also capable of authenticating the authorship. To verify this claim two tests were designed. The test hypothesis was that the words and the word order in the sentences could authenticate the author. In the first test three authors were selected. The selected authors were William Shakespeare, Edgar Allan Poe, and George Bernard Shaw. Three texts from each author were examined. Every text was one by one used as a prototype. The two nearest matches with the prototype were noted. The second test uses the Reuters-21578 financial news database. A group of 25 short financial news reports from five different authors are examined. Our new methodology and the interesting results from the two tests are reported in this paper. In the first test, for Shakespeare and for Poe all cases were successful. For Shaw one text was confused with Poe. In the second test the Reuters-21578 financial news were identified by the author relatively well. The resolution is that our text mining methodology seems to be capable of authorship attribution.

  10. The Temptation of Documentation: Potential and Challenges of Videographic Documentation and Interpretation

    Directory of Open Access Journals (Sweden)

    Marie Winckler

    2014-02-01

    Full Text Available Insights into the civic education classroom can be gained through videographic documentation. Videographic material offers, as I argue in this article, great possibilities: Through a reconstructive approach insights into dimensions of civic education such as spatial organisation, symbolic representation and non-verbal communication may emerge. In this way, a deeper understanding of informal political learning in school can be obtained. These aspects have not yet been considered in depth with videographic documentation primarily employed to date in teacher training contexts and lesson evaluation. The case study I present here was inspired by the documentary method and both the potential and limitations of videographic interpretation are discussed in this context. The study also suggests that what is not offered by videographic documentation includes insights into the individual and collective integration of experiences in civic education lessons.

  11. Text Generation: The State of the Art and the Literature.

    Science.gov (United States)

    Mann, William C.; And Others

    This report comprises two documents which describe the state of the art of computer generation of natural language text. Both were prepared by a panel of individuals who are active in research on text generation. The first document assesses the techniques now available for use in systems design, covering all of the technical methods by which…

  12. Indexation automatique des textes arabes : état de l’art

    Directory of Open Access Journals (Sweden)

    Mohamed Salim El Bazzi

    2016-11-01

    Full Text Available Document indexing is a crucial step in the text mining process. It is used to represent documents by the most relevant descriptors of their contents. Several approaches are proposed in the literature, particularly for English, but they are unusable for Arabic documents, considering its specific characteristics and its morphological complexity, grammar and vocabulary. In this paper, we present a reading in the state of the art of indexation methods and their contribution to improve Arabic document’s processing. We also propose a categorization of works according to the most used approaches and methods for indexing textual documents. We adopted a qualitative selection of papers and we retained papers approving notable indexation contributions and illustrating significant results

  13. Assessing semantic similarity of texts - Methods and algorithms

    Science.gov (United States)

    Rozeva, Anna; Zerkova, Silvia

    2017-12-01

    Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.

  14. Ultrasound-guided nerve blocks - is documentation and education feasible using only text and pictures?

    DEFF Research Database (Denmark)

    Worm, Bjarne Skjødt; Krag, Mette; Jensen, Kenneth

    2014-01-01

    With the advancement of ultrasound-guidance for peripheral nerve blocks, still pictures from representative ultrasonograms are increasingly used for clinical procedure documentation of the procedure and for educational purposes in textbook materials. However, little is actually known about...... the clinical and educational usefulness of these still pictures, in particular how well nerve structures can be identified compared to real-time ultrasound examination. We aimed to quantify gross visibility or ultrastructure using still picture sonograms compared to real time ultrasound for trainees...... and experts, for large or small nerves, and discuss the clinical or educational relevance of these findings....

  15. Semantic Metadata for Heterogeneous Spatial Planning Documents

    Science.gov (United States)

    Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.

    2016-09-01

    Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.

  16. Inferring Group Processes from Computer-Mediated Affective Text Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Schryver, Jack C [ORNL; Begoli, Edmon [ORNL; Jose, Ajith [Missouri University of Science and Technology; Griffin, Christopher [Pennsylvania State University

    2011-02-01

    Political communications in the form of unstructured text convey rich connotative meaning that can reveal underlying group social processes. Previous research has focused on sentiment analysis at the document level, but we extend this analysis to sub-document levels through a detailed analysis of affective relationships between entities extracted from a document. Instead of pure sentiment analysis, which is just positive or negative, we explore nuances of affective meaning in 22 affect categories. Our affect propagation algorithm automatically calculates and displays extracted affective relationships among entities in graphical form in our prototype (TEAMSTER), starting with seed lists of affect terms. Several useful metrics are defined to infer underlying group processes by aggregating affective relationships discovered in a text. Our approach has been validated with annotated documents from the MPQA corpus, achieving a performance gain of 74% over comparable random guessers.

  17. Securing Document Warehouses against Brute Force Query Attacks

    Directory of Open Access Journals (Sweden)

    Sergey Vladimirovich Zapechnikov

    2017-04-01

    Full Text Available The paper presents the scheme of data management and protocols for securing document collection against adversary users who try to abuse their access rights to find out the full content of confidential documents. The configuration of secure document retrieval system is described and a suite of protocols among the clients, warehouse server, audit server and database management server is specified. The scheme makes it infeasible for clients to establish correspondence between the documents relevant to different search queries until a moderator won’t give access to these documents. The proposed solution allows ensuring higher security level for document warehouses.

  18. IR and OLAP in XML document warehouses

    DEFF Research Database (Denmark)

    Perez, Juan Manuel; Pedersen, Torben Bach; Berlanga, Rafael

    2005-01-01

    In this paper we propose to combine IR and OLAP (On-Line Analytical Processing) technologies to exploit a warehouse of text-rich XML documents. In the system we plan to develop, a multidimensional implementation of a relevance modeling document model will be used for interactively querying...

  19. Chemical-text hybrid search engines.

    Science.gov (United States)

    Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J

    2010-01-01

    As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.

  20. Mining protein function from text using term-based support vector machines

    Science.gov (United States)

    Rice, Simon B; Nenadic, Goran; Stapley, Benjamin J

    2005-01-01

    Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2. PMID:15960835

  1. An Integrated Multimedia Approach to Cultural Heritage e-Documents

    NARCIS (Netherlands)

    Smeulders, A.W.M.; Hardman, H.L.; Schreiber, G.; Geusebroek, J.M.

    2002-01-01

    We discuss access to e-documents from three different perspectives beyond the plain keyword web-search of the entire document. The first one is the situation-depending delivery of multimedia documents adapting the preferred form (picture, text, speech) to the available information capacity or need

  2. The use of CD-ROMs for storage and document delivery at the British Library Document Supply Centre

    International Nuclear Information System (INIS)

    Bradbury, D.

    1990-05-01

    The British Library Document Supply Centre (BLDSC) has been in the forefront of international document delivery for 20 years. During the last 5 years it has been very actively involved in the ADONIS Project, through which the full text of some 200 journals in the life sciences have been stored, accessed, and delivered through the medium of CD-ROM. The BLDSC's involvement in this project is described and indications of the lessons learned and of the implications for international document delivery systems in the future are given. (author)

  3. The use of CD-ROMs for storage and document delivery at the British Library Document Supply Centre

    Energy Technology Data Exchange (ETDEWEB)

    Bradbury, D [British Library Document Supply Centre, Boston SPA (United Kingdom)

    1990-05-01

    The British Library Document Supply Centre (BLDSC) has been in the forefront of international document delivery for 20 years. During the last 5 years it has been very actively involved in the ADONIS Project, through which the full text of some 200 journals in the life sciences have been stored, accessed, and delivered through the medium of CD-ROM. The BLDSC`s involvement in this project is described and indications of the lessons learned and of the implications for international document delivery systems in the future are given. (author).

  4. Audit of Orthopaedic Surgical Documentation

    Directory of Open Access Journals (Sweden)

    Fionn Coughlan

    2015-01-01

    Full Text Available Introduction. The Royal College of Surgeons in England published guidelines in 2008 outlining the information that should be documented at each surgery. St. James’s Hospital uses a standard operation sheet for all surgical procedures and these were examined to assess documentation standards. Objectives. To retrospectively audit the hand written orthopaedic operative notes according to established guidelines. Methods. A total of 63 operation notes over seven months were audited in terms of date and time of surgery, surgeon, procedure, elective or emergency indication, operative diagnosis, incision details, signature, closure details, tourniquet time, postop instructions, complications, prosthesis, and serial numbers. Results. A consultant performed 71.4% of procedures; however, 85.7% of the operative notes were written by the registrar. The date and time of surgery, name of surgeon, procedure name, and signature were documented in all cases. The operative diagnosis and postoperative instructions were frequently not documented in the designated location. Incision details were included in 81.7% and prosthesis details in only 30% while the tourniquet time was not documented in any. Conclusion. Completion and documentation of operative procedures were excellent in some areas; improvement is needed in documenting tourniquet time, prosthesis and incision details, and the location of operative diagnosis and postoperative instructions.

  5. Transitioning Existing Content: inferring organisation-specific documents

    Directory of Open Access Journals (Sweden)

    Arijit Sengupta

    2000-11-01

    Full Text Available A definition for a document type within an organization represents an organizational norm about the way the organizational actors represent products and supporting evidence of organizational processes. Generating a good organization-specific document structure is, therefore, important since it can capture a shared understanding among the organizational actors about how certain business processes should be performed. Current tools that generate document type definitions focus on the underlying technology, emphasizing tags created in a single instance document. The tools, thus, fall short of capturing the shared understanding between organizational actors about how a given document type should be represented. We propose a method for inferring organization-specific document structures using multiple instance documents as inputs. The method consists of heuristics that combine individual document definitions, which may have been compiled using standard algorithms. We propose a number of heuristics utilizing artificial intelligence and natural language processing techniques. As the research progresses, the heuristics will be tested on a suite of test cases representing multiple instance documents for different document types. The complete methodology will be implemented as a research prototype

  6. Text mining by Tsallis entropy

    Science.gov (United States)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  7. Building Background Knowledge through Reading: Rethinking Text Sets

    Science.gov (United States)

    Lupo, Sarah M.; Strong, John Z.; Lewis, William; Walpole, Sharon; McKenna, Michael C.

    2018-01-01

    To increase reading volume and help students access challenging texts, the authors propose a four-dimensional framework for text sets. The quad text set framework is designed around a target text: a challenging content area text, such as a canonical literary work, research article, or historical primary source document. The three remaining…

  8. ASM Based Synthesis of Handwritten Arabic Text Pages

    Directory of Open Access Journals (Sweden)

    Laslo Dinges

    2015-01-01

    Full Text Available Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.

  9. Modeling Documents with Event Model

    Directory of Open Access Journals (Sweden)

    Longhui Wang

    2015-08-01

    Full Text Available Currently deep learning has made great breakthroughs in visual and speech processing, mainly because it draws lessons from the hierarchical mode that brain deals with images and speech. In the field of NLP, a topic model is one of the important ways for modeling documents. Topic models are built on a generative model that clearly does not match the way humans write. In this paper, we propose Event Model, which is unsupervised and based on the language processing mechanism of neurolinguistics, to model documents. In Event Model, documents are descriptions of concrete or abstract events seen, heard, or sensed by people and words are objects in the events. Event Model has two stages: word learning and dimensionality reduction. Word learning is to learn semantics of words based on deep learning. Dimensionality reduction is the process that representing a document as a low dimensional vector by a linear mode that is completely different from topic models. Event Model achieves state-of-the-art results on document retrieval tasks.

  10. English Metafunction Analysis in Chemistry Text: Characterization of Scientific Text

    Directory of Open Access Journals (Sweden)

    Ahmad Amin Dalimunte, M.Hum

    2013-09-01

    Full Text Available The objectives of this research are to identify what Metafunctions are applied in chemistry text and how they characterize a scientific text. It was conducted by applying content analysis. The data for this research was a twelve-paragraph chemistry text. The data were collected by applying a documentary technique. The document was read and analyzed to find out the Metafunction. The data were analyzed by some procedures: identifying the types of process, counting up the number of the processes, categorizing and counting up the cohesion devices, classifying the types of modulation and determining modality value, finally counting up the number of sentences and clauses, then scoring the grammatical intricacy index. The findings of the research show that Material process (71of 100 is mostly used, circumstance of spatial location (26 of 56 is more dominant than the others. Modality (5 is less used in order to avoid from subjectivity. Impersonality is implied through less use of reference either pronouns (7 or demonstrative (7, conjunctions (60 are applied to develop ideas, and the total number of the clauses are found much more dominant (109 than the total number of the sentences (40 which results high grammatical intricacy index. The Metafunction found indicate that the chemistry text has fulfilled the characteristics of scientific or academic text which truly reflects it as a natural science.

  11. La Documentation photographique

    Directory of Open Access Journals (Sweden)

    Magali Hamm

    2009-03-01

    Full Text Available La Documentation photographique, revue destinée aux enseignants et étudiants en histoire-géographie, place l’image au cœur de sa ligne éditoriale. Afin de suivre les évolutions actuelles de la géographie, la collection propose une iconographie de plus en plus diversifiée : cartes, photographies, mais aussi caricatures, une de journal ou publicité, toutes étant considérées comme un document géographique à part entière. Car l’image peut se faire synthèse ; elle peut au contraire montrer les différentes facettes d’un objet ; souvent elle permet d’incarner des phénomènes géographiques. Associées à d’autres documents, les images aident les enseignants à initier leurs élèves à des raisonnements géographiques complexes. Mais pour apprendre à les lire, il est fondamental de les contextualiser, de les commenter et d’interroger leur rapport au réel.The Documentation photographique, magazine dedicated to teachers and students in History - Geography, places the image at the heart of its editorial line. In order to follow the evolutions of Geography, the collection presents a more and more diversified iconography: maps, photographs, but also drawings or advertisements, all this documents being considered as geographical ones. Because image can be a synthesis; on the contrary it can present the different facets of a same object; often it enables to portray geographical phenomena. Related to other documents, images assist the teachers in the students’ initiation to complex geographical reasoning. But in order to learn how to read them, it is fundamental to contextualize them, comment them and question their relations with reality.

  12. Text Summarization Using FrameNet-Based Semantic Graph Model

    Directory of Open Access Journals (Sweden)

    Xu Han

    2016-01-01

    Full Text Available Text summarization is to generate a condensed version of the original document. The major issues for text summarization are eliminating redundant information, identifying important difference among documents, and recovering the informative content. This paper proposes a Semantic Graph Model which exploits the semantic information of sentence using FSGM. FSGM treats sentences as vertexes while the semantic relationship as the edges. It uses FrameNet and word embedding to calculate the similarity of sentences. This method assigns weight to both sentence nodes and edges. After all, it proposes an improved method to rank these sentences, considering both internal and external information. The experimental results show that the applicability of the model to summarize text is feasible and effective.

  13. Documentation of Cultural Heritage Objects

    Directory of Open Access Journals (Sweden)

    Jon Grobovšek

    2013-09-01

    Full Text Available EXTENDED ABSTRACT:The first and important phase of documentation of cultural heritage objects is to understand which objects need to be documented. The entire documentation process is determined by the characteristics and scope of the cultural heritage object. The next question to be considered is the expected outcome of the documentation process and the purpose for which it will be used. These two essential guidelines determine each stage of the documentation workflow: the choice of the most appropriate data capturing technology and data processing method, how detailed should the documentation be, what problems may occur, what the expected outcome is, what it will be used for, and the plan for storing data and results. Cultural heritage objects require diverse data capturing and data processing methods. It is important that even the first stages of raw data capturing are oriented towards the applicability of results. The selection of the appropriate working method can facilitate the data processing and the preparation of final documentation. Documentation of paintings requires different data capturing method than documentation of buildings or building areas. The purpose of documentation can also be the preservation of the contemporary cultural heritage to posterity or the basis for future projects and activities on threatened objects. Documentation procedures should be adapted to our needs and capabilities. Captured and unprocessed data are lost unless accompanied by additional analyses and interpretations. Information on tools, procedures and outcomes must be included into documentation. A thorough analysis of unprocessed but accessible documentation, if adequately stored and accompanied by additional information, enables us to gather useful data. In this way it is possible to upgrade the existing documentation and to avoid data duplication or unintentional misleading of users. The documentation should be archived safely and in a way to meet

  14. Text localization using standard deviation analysis of structure elements and support vector machines

    Directory of Open Access Journals (Sweden)

    Zagoris Konstantinos

    2011-01-01

    Full Text Available Abstract A text localization technique is required to successfully exploit document images such as technical articles and letters. The proposed method detects and extracts text areas from document images. Initially a connected components analysis technique detects blocks of foreground objects. Then, a descriptor that consists of a set of suitable document structure elements is extracted from the blocks. This is achieved by incorporating an algorithm called Standard Deviation Analysis of Structure Elements (SDASE which maximizes the separability between the blocks. Another feature of the SDASE is that its length adapts according to the requirements of the application. Finally, the descriptor of each block is used as input to a trained support vector machines that classify the block as text or not. The proposed technique is also capable of adjusting to the text structure of the documents. Experimental results on benchmarking databases demonstrate the effectiveness of the proposed method.

  15. Requirements for the data transfer during the examination of design documentation

    Directory of Open Access Journals (Sweden)

    Karakozova Irina

    2017-01-01

    Full Text Available When you transfer the design documents to the examination office, number of incompatible electronic documents increases dramatically. The article discusses the way to solve the problem of transferring of the text and graphic data of design documentation for state and non-state expertise, as well as verification of estimates and requirement management. The methods for the recognition of the system elements and requirements for the transferring of text and graphic design documents are provided. The need to use the classification and coding of various elements of information systems (structures, objects, resources, requirements, contracts, etc. in data transferring systems is indicated separately. The authors have developed a sequence of document processing and transmission of data during the examination, and propose a language for describing the construction of the facility, taking into account the classification criteria of the structures and construction works.

  16. Document retrieval on repetitive string collections.

    Science.gov (United States)

    Gagie, Travis; Hartikainen, Aleksi; Karhu, Kalle; Kärkkäinen, Juha; Navarro, Gonzalo; Puglisi, Simon J; Sirén, Jouni

    2017-01-01

    Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists , that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top- k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple [Formula: see text] model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.

  17. Text Classification and Distributional features techniques in Datamining and Warehousing

    OpenAIRE

    Bethu, Srikanth; Babu, G Charless; Vinoda, J; Priyadarshini, E; rao, M Raghavendra

    2013-01-01

    Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of unimportant words may increase and document may be classified in the wrong category.For reducing the error of classifying of documents in wrong category. The Distributional features are introduced. In the Distribuional Features, the Distribution of the words in ...

  18. New Challenges of the Documentation in Media

    Directory of Open Access Journals (Sweden)

    Antonio García Jiménez

    2015-07-01

    Full Text Available This special issue, presented by index.comunicación, is focused on media related information & documentation. This field undergoes constant and profound changes, especially visible in documentation processes. A situation characterized by the existence of tablets, smartphones, applications, and by the almost achieved digitization of traditional documents, in addition to the crisis of the press business model, that involves mutations in the journalists’ tasks and in the relationship between them and Documentation. Papers included in this special issue focus on some of the concerns in this domain: the progressive autonomy of the journalist in access to information sources, the role of press offices as documentation sources, the search of information on the web, the situation of media blogs, the viability of elements of information architecture in smart TV and the development of social TV and its connection to Documentation.

  19. Text processing for technical reports (direct computer-assisted origination, editing, and output of text)

    Energy Technology Data Exchange (ETDEWEB)

    De Volpi, A.; Fenrick, M. R.; Stanford, G. S.; Fink, C. L.; Rhodes, E. A.

    1980-10-01

    Documentation often is a primary residual of research and development. Because of this important role and because of the large amount of time consumed in generating technical reports, particularly those containing formulas and graphics, an existing data-processing computer system has been adapted so as to provide text-processing of technical documents. Emphasis has been on accuracy, turnaround time, and time savings for staff and secretaries, for the types of reports normally produced in the reactor development program. The computer-assisted text-processing system, called TXT, has been implemented to benefit primarily the originator of technical reports. The system is of particular value to professional staff, such as scientists and engineers, who have responsibility for generating much correspondence or lengthy, complex reports or manuscripts - especially if prompt turnaround and high accuracy are required. It can produce text that contains special Greek or mathematical symbols. Written in FORTRAN and MACRO, the program TXT operates on a PDP-11 minicomputer under the RSX-11M multitask multiuser monitor. Peripheral hardware includes videoterminals, electrostatic printers, and magnetic disks. Either data- or word-processing tasks may be performed at the terminals. The repertoire of operations has been restricted so as to minimize user training and memory burden. Spectarial staff may be readily trained to make corrections from annotated copy. Some examples of camera-ready copy are provided.

  20. Vietnamese Document Representation and Classification

    Science.gov (United States)

    Nguyen, Giang-Son; Gao, Xiaoying; Andreae, Peter

    Vietnamese is very different from English and little research has been done on Vietnamese document classification, or indeed, on any kind of Vietnamese language processing, and only a few small corpora are available for research. We created a large Vietnamese text corpus with about 18000 documents, and manually classified them based on different criteria such as topics and styles, giving several classification tasks of different difficulty levels. This paper introduces a new syllable-based document representation at the morphological level of the language for efficient classification. We tested the representation on our corpus with different classification tasks using six classification algorithms and two feature selection techniques. Our experiments show that the new representation is effective for Vietnamese categorization, and suggest that best performance can be achieved using syllable-pair document representation, an SVM with a polynomial kernel as the learning algorithm, and using Information gain and an external dictionary for feature selection.

  1. Designing Documents for People to Use

    Directory of Open Access Journals (Sweden)

    David Sless

    Full Text Available This article reports on the work of Communication Research Institute (CRI, an international research center specializing in communication and information design. With the support of government, regulators, industry bodies, and business—and with the participation of people and their advocates—CRI has worked on over 200 public document design projects since it began as a small unit in 1985. CRI investigates practical methods and achievable standards for designing digital and paper public documents, including forms; workplace procedural notices; bills, letters, and emails sent by organizations; labels and instructions that accompany products and services; and legal and financial documents and contracts. CRI has written model grammars for the document types it designs, and the cumulative data from CRI projects has led to a set of systematic methods for designing public-use documents to a high standard. Through research, design, publishing, and advocacy, CRI works to measurably improve the ordinary documents we all have to use. Keywords: Information design, Design methods, Design standards, Communication design, Design diagnostic testing, Design research

  2. Design Document for the Technology Demonstration of the Joint Network Defence and Management System (JNDMS) Project

    Science.gov (United States)

    2012-02-06

    Event Interface Custom ASCII JSS Client Y (Spectrum) 3.2 8 IT Infrastructure Performance Data/Vulnerability Assessment eHealth , Spectrum NSM...monitoring of infrastructure servers.) The Concord product line. Concord products ( eHealth and Spectrum) can provide both real-time and historical...Network and Systems Management (NSM) • Unicenter Asset Management • Spectrum • eHealth • Centennial Discovery Table 12 summarizes the the role of

  3. The Role of Text Mining in Export Control

    Energy Technology Data Exchange (ETDEWEB)

    Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon [Korea Institute of Nuclear Nonproliferation and Control, Daejeon (Korea, Republic of)

    2015-10-15

    Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control.

  4. The Role of Text Mining in Export Control

    International Nuclear Information System (INIS)

    Tae, Jae-woong; Son, Choul-woong; Shin, Dong-hoon

    2015-01-01

    Korean government provides classification services to exporters. It is simple to copy technology such as documents and drawings. Moreover, it is also easy that new technology derived from the existing technology. The diversity of technology makes classification difficult because the boundary between strategic and nonstrategic technology is unclear and ambiguous. Reviewers should consider previous classification cases enough. However, the increase of the classification cases prevent consistent classifications. This made another innovative and effective approaches necessary. IXCRS (Intelligent Export Control Review System) is proposed to coincide with demands. IXCRS consists of and expert system, a semantic searching system, a full text retrieval system, and image retrieval system and a document retrieval system. It is the aim of the present paper to observe the document retrieval system based on text mining and to discuss how to utilize the system. This study has demonstrated how text mining technique can be applied to export control. The document retrieval system supports reviewers to treat previous classification cases effectively. Especially, it is highly probable that similarity data will contribute to specify classification criterion. However, an analysis of the system showed a number of problems that remain to be explored such as a multilanguage problem and an inclusion relationship problem. Further research should be directed to solve problems and to apply more data mining techniques so that the system should be used as one of useful tools for export control

  5. Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture

    Science.gov (United States)

    Sanfilippo, Antonio [Richland, WA; Calapristi, Augustin J [West Richland, WA; Crow, Vernon L [Richland, WA; Hetzler, Elizabeth G [Kennewick, WA; Turner, Alan E [Kennewick, WA

    2009-12-22

    Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.

  6. Using the Characteristics of Documents, Users and Tasks to Predict the Situational Relevance of Health Web Documents

    Directory of Open Access Journals (Sweden)

    Melinda Oroszlányová

    2017-09-01

    Full Text Available Relevance is usually estimated by search engines using document content, disregarding the user behind the search and the characteristics of the task. In this work, we look at relevance as framed in a situational context, calling it situational relevance, and analyze whether it is possible to predict it using documents, users and tasks characteristics. Using an existing dataset composed of health web documents, relevance judgments for information needs, user and task characteristics, we build a multivariate prediction model for situational relevance. Our model has an accuracy of 77.17%. Our findings provide insights into features that could improve the estimation of relevance by search engines, helping to conciliate the systemic and situational views of relevance. In a near future we will work on the automatic assessment of document, user and task characteristics.

  7. New mathematical cuneiform texts

    CERN Document Server

    Friberg, Jöran

    2016-01-01

    This monograph presents in great detail a large number of both unpublished and previously published Babylonian mathematical texts in the cuneiform script. It is a continuation of the work A Remarkable Collection of Babylonian Mathematical Texts (Springer 2007) written by Jöran Friberg, the leading expert on Babylonian mathematics. Focussing on the big picture, Friberg explores in this book several Late Babylonian arithmetical and metro-mathematical table texts from the sites of Babylon, Uruk and Sippar, collections of mathematical exercises from four Old Babylonian sites, as well as a new text from Early Dynastic/Early Sargonic Umma, which is the oldest known collection of mathematical exercises. A table of reciprocals from the end of the third millennium BC, differing radically from well-documented but younger tables of reciprocals from the Neo-Sumerian and Old-Babylonian periods, as well as a fragment of a Neo-Sumerian clay tablet showing a new type of a labyrinth are also discussed. The material is presen...

  8. The Effects of Tabular-Based Content Extraction on Patent Document Clustering

    Directory of Open Access Journals (Sweden)

    Michael W. Berry

    2012-10-01

    Full Text Available Data can be represented in many different ways within a particular document or set of documents. Hence, attempts to automatically process the relationships between documents or determine the relevance of certain document objects can be problematic. In this study, we have developed software to automatically catalog objects contained in HTML files for patents granted by the United States Patent and Trademark Office (USPTO. Once these objects are recognized, the software creates metadata that assigns a data type to each document object. Such metadata can be easily processed and analyzed for subsequent text mining tasks. Specifically, document similarity and clustering techniques were applied to a subset of the USPTO document collection. Although our preliminary results demonstrate that tables and numerical data do not provide quantifiable value to a document’s content, the stage for future work in measuring the importance of document objects within a large corpus has been set.

  9. STANDARDIZATION OF MEDICAL DOCUMENT FLOW: PRINCIPLES AND FEATURES

    Directory of Open Access Journals (Sweden)

    Melentev Vladimir Anatolevich

    2013-04-01

    Full Text Available In presented article the questions connected with the general concepts and bases of functioning of document flow in borders of any economic object (the enterprise, establishment, the organization are considered. Gostirovanny definition of document flow, classification of types of documentary streams is given. The basic principles of creation of document flow, following which are considered allows to create optimum structure документопотока and nature of movement of documents; interrelation of external and internal influences. Further basic elements of medical document flow are considered; the main problems of medical document flow being, besides, major factors, distinguishing medical document flow from document flow of manufacturing enterprises or other economic objects are specified. From consideration of these problems the conclusion about an initial stage of their decision - standardization of the medical document flow, being, besides, is drawn by the first stage of creation of a common information space of medical branch.

  10. Log ASCII Standard (LAS) Files for Geophysical Wireline Well Logs and Their Application to Geologic Cross Sections Through the Central Appalachian Basin

    Science.gov (United States)

    Crangle, Robert D.

    2007-01-01

    Introduction The U.S. Geological Survey (USGS) uses geophysical wireline well logs for a variety of purposes, including stratigraphic correlation (Hettinger, 2001, Ryder, 2002), petroleum reservoir analyses (Nelson and Bird, 2005), aquifer studies (Balch, 1988), and synthetic seismic profiles (Kulander and Ryder, 2005). Commonly, well logs are easier to visualize, manipulate, and interpret when available in a digital format. In recent geologic cross sections E-E' and D-D', constructed through the central Appalachian basin (Ryder, Swezey, and others, in press; Ryder, Crangle, and others, in press), gamma ray well log traces and lithologic logs were used to correlate key stratigraphic intervals (Fig. 1). The stratigraphy and structure of the cross sections are illustrated through the use of graphical software applications (e.g., Adobe Illustrator). The gamma ray traces were digitized in Neuralog (proprietary software) from paper well logs and converted to a Log ASCII Standard (LAS) format. Once converted, the LAS files were transformed to images through an LAS-reader application (e.g., GeoGraphix Prizm) and then overlain in positions adjacent to well locations, used for stratigraphic control, on each cross section. This report summarizes the procedures used to convert paper logs to a digital LAS format using a third-party software application, Neuralog. Included in this report are LAS files for sixteen wells used in geologic cross section E-E' (Table 1) and thirteen wells used in geologic cross section D-D' (Table 2).

  11. Extracting and connecting chemical structures from text sources using chemicalize.org.

    Science.gov (United States)

    Southan, Christopher; Stracz, Andras

    2013-04-23

    Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and

  12. Strategy as Texts

    DEFF Research Database (Denmark)

    Obed Madsen, Søren

    of the strategy into four categories. Second, the managers produce new texts based on the original strategy document by using four different ways of translation models. The study’s findings contribute to three areas. Firstly, it shows that translation is more than a sociological process. It is also...... a craftsmanship that requires knowledge and skills, which unfortunately seems to be overlooked in both the literature and in practice. Secondly, it shows that even though a strategy text is in singular, the translation makes strategy plural. Thirdly, the article proposes a way to open up the black box of what......This article shows empirically how managers translate a strategy plan at an individual level. By analysing how managers in three organizations translate strategies, it identifies that the translation happens in two steps: First, the managers decipher the strategy by coding the different parts...

  13. Enhancing biomedical text summarization using semantic relation extraction.

    Directory of Open Access Journals (Sweden)

    Yue Shang

    Full Text Available Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1 We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2 We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3 For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  14. Patent documentation - comparison of two MT strategies

    DEFF Research Database (Denmark)

    Offersgaard, Lene; Povlsen, Claus

    2007-01-01

    This paper focuses on two matters: A comparison of how two different MT strategies manage translating the text type of patent documentation and a survey of what is needed to transform a MT research prototype system to a translation application for patent texts. The two MT strategies is represented....... The distinctive text type of patents pose special demands for machine translation and these aspects are discussed based on linguistic observations with focus on the users point of view. Two main demands are automatic pre processing of the documents and implementation of a module which in a flexible and user......-friendly manner offers the opportunity to extend the lexical coverage of the system. These demands and the comparison of the two MT strategies are discussed on the basis of proofread patents....

  15. Digital watermarks in electronic document circulation

    Directory of Open Access Journals (Sweden)

    Vitaliy Grigorievich Ivanenko

    2017-07-01

    Full Text Available This paper reviews different protection methods for electronic documents, their good and bad qualities. Common attacks on electronic documents are analyzed. Digital signature and ways of eliminating its flaws are studied. Different digital watermark embedding methods are described, they are divided into 2 types. The solution to protection of electronic documents is based on embedding digital watermarks. Comparative analysis of this methods is given. As a result, the most convenient method is suggested – reversible data hiding. It’s remarked that this technique excels at securing the integrity of the container and its digital watermark. Digital watermark embedding system should prevent illegal access to the digital watermark and its container. Digital watermark requirements for electronic document protection are produced. Legal aspect of copyright protection is reviewed. Advantages of embedding digital watermarks in electronic documents are produced. Modern reversible data hiding techniques are studied. Distinctive features of digital watermark use in Russia are highlighted. Digital watermark serves as an additional layer of defense, that is in most cases unknown to the violator. With an embedded digital watermark, it’s impossible to misappropriate the authorship of the document, even if the intruder signs his name on it. Therefore, digital watermarks can act as an effective additional tool to protect electronic documents.

  16. Eigenvector space model to capture features of documents

    Directory of Open Access Journals (Sweden)

    Choi DONGJIN

    2011-09-01

    Full Text Available Eigenvectors are a special set of vectors associated with a linear system of equations. Because of the special property of eigenvector, it has been used a lot for computer vision area. When the eigenvector is applied to information retrieval field, it is possible to obtain properties of documents data corpus. To capture properties of given documents, this paper conducted simple experiments to prove the eigenvector is also possible to use in document analysis. For the experiment, we use short abstract document of Wikipedia provided by DBpedia as a document corpus. To build an original square matrix, the most popular method named tf-idf measurement will be used. After calculating the eigenvectors of original matrix, each vector will be plotted into 3D graph to find what the eigenvector means in document processing.

  17. Basic freight forwarding and transport
 documentation in freight forwarder’s work

    Directory of Open Access Journals (Sweden)

    Adam Salomon

    2014-09-01

    Full Text Available The purpose of the article is to present the basic documentation in international freight forwarder’s work, in particular, insurance documents and transport documents in various modes of transport. An additional goal is to identify sources of the paper, which can be used to properly completing the individual documents.

  18. Using Text Documents from American Memory.

    Science.gov (United States)

    Singleton, Laurel R., Ed.

    2002-01-01

    This publication contains classroom-tested teaching ideas. For grades K-4, "'Blessed Ted-fred': Famous Fathers Write to Their Children" uses American Memory for primary source letters written by Theodore Roosevelt and Alexander Graham Bell to their children. For grades 5-8, "Found Poetry and the American Life Histories…

  19. ASM Based Synthesis of Handwritten Arabic Text Pages.

    Science.gov (United States)

    Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed

    2015-01-01

    Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.

  20. Novel grid-based optical Braille conversion: from scanning to wording

    Science.gov (United States)

    Yoosefi Babadi, Majid; Jafari, Shahram

    2011-12-01

    Grid-based optical Braille conversion (GOBCO) is explained in this article. The grid-fitting technique involves processing scanned images taken from old hard-copy Braille manuscripts, recognising and converting them into English ASCII text documents inside a computer. The resulted words are verified using the relevant dictionary to provide the final output. The algorithms employed in this article can be easily modified to be implemented on other visual pattern recognition systems and text extraction applications. This technique has several advantages including: simplicity of the algorithm, high speed of execution, ability to help visually impaired persons and blind people to work with fax machines and the like, and the ability to help sighted people with no prior knowledge of Braille to understand hard-copy Braille manuscripts.

  1. NAMED ENTITY RECOGNITION FROM BIOMEDICAL TEXT -AN INFORMATION EXTRACTION TASK

    Directory of Open Access Journals (Sweden)

    N. Kanya

    2016-07-01

    Full Text Available Biomedical Text Mining targets the Extraction of significant information from biomedical archives. Bio TM encompasses Information Retrieval (IR and Information Extraction (IE. The Information Retrieval will retrieve the relevant Biomedical Literature documents from the various Repositories like PubMed, MedLine etc., based on a search query. The IR Process ends up with the generation of corpus with the relevant document retrieved from the Publication databases based on the query. The IE task includes the process of Preprocessing of the document, Named Entity Recognition (NER from the documents and Relationship Extraction. This process includes Natural Language Processing, Data Mining techniques and machine Language algorithm. The preprocessing task includes tokenization, stop word Removal, shallow parsing, and Parts-Of-Speech tagging. NER phase involves recognition of well-defined objects such as genes, proteins or cell-lines etc. This process leads to the next phase that is extraction of relationships (IE. The work was based on machine learning algorithm Conditional Random Field (CRF.

  2. Principles of reusability of XML-based enterprise documents

    Directory of Open Access Journals (Sweden)

    Roman Malo

    2010-01-01

    Full Text Available XML (Extensible Markup Language represents one of flexible platforms for processing enterprise documents. Its simple syntax and powerful software infrastructure for processing this type of documents is a guarantee for high interoperability of individual documents. XML is today one of technologies influencing all aspects of ICT area.In the paper questions and basic principles of reusing XML-based documents are described in the field of enterprise documents. If we use XML databases or XML data types for storing these types of documents then partial redundancy could be expected due to possible documents’ similarity. This similarity can be found especially in documents’ structure and also in documents’ content and its elimination is necessary part of data optimization.The main idea of the paper is focused to possibilities how to think about dividing complex XML docu­ments into independent fragments that can be used as standalone documents and how to process them.Conclusions could be applied within software tools working with XML-based structured data and documents as document management systems or content management systems.

  3. A New Wavelet-Based Document Image Segmentation Scheme

    Institute of Scientific and Technical Information of China (English)

    赵健; 李道京; 俞卞章; 耿军平

    2002-01-01

    The document image segmentation is very useful for printing, faxing and data processing. An algorithm is developed for segmenting and classifying document image. Feature used for classification is based on the histogram distribution pattern of different image classes. The important attribute of the algorithm is using wavelet correlation image to enhance raw image's pattern, so the classification accuracy is improved. In this paper document image is divided into four types: background, photo, text and graph. Firstly, the document image background has been distingusished easily by former normally method; secondly, three image types will be distinguished by their typical histograms, in order to make histograms feature clearer, each resolution' s HH wavelet subimage is used to add to the raw image at their resolution. At last, the photo, text and praph have been devided according to how the feature fit to the Laplacian distrbution by -X2 and L. Simulations show that classification accuracy is significantly improved. The comparison with related shows that our algorithm provides both lower classification error rates and better visual results.

  4. Enhancing biomedical text summarization using semantic relation extraction.

    Science.gov (United States)

    Shang, Yue; Li, Yanpeng; Lin, Hongfei; Yang, Zhihao

    2011-01-01

    Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.

  5. Experimental determination of chosen document elements parameters from raster graphics sources

    Directory of Open Access Journals (Sweden)

    Jiří Rybička

    2010-01-01

    Full Text Available Visual appearance of documents and their formal quality is considered to be as important as the content quality. Formal and typographical quality of documents can be evaluated by an automated system that processes raster images of documents. A document is described by a formal model that treats a page as an object and also as a set of elements, whereas page elements include text and graphic object. All elements are described by their parameters depending on elements’ type. For future evaluation, mainly text objects are important. This paper describes the experimental determination of chosen document elements parameters from raster images. Techniques for image processing are used, where an image is represented as a matrix of dots and parameter values are extracted. Algorithms for parameter extraction from raster images were designed and were aimed mainly at typographical parameters like indentation, alignment, font size or spacing. Algorithms were tested on a set of 100 images of paragraphs or pages and provide very good results. Extracted parameters can be directly used for typographical quality evaluation.

  6. Social Media Text Classification by Enhancing Well-Formed Text Trained Model

    Directory of Open Access Journals (Sweden)

    Phat Jotikabukkana

    2016-09-01

    Full Text Available Social media are a powerful communication tool in our era of digital information. The large amount of user-generated data is a useful novel source of data, even though it is not easy to extract the treasures from this vast and noisy trove. Since classification is an important part of text mining, many techniques have been proposed to classify this kind of information. We developed an effective technique of social media text classification by semi-supervised learning utilizing an online news source consisting of well-formed text. The computer first automatically extracts news categories, well-categorized by publishers, as classes for topic classification. A bag of words taken from news articles provides the initial keywords related to their category in the form of word vectors. The principal task is to retrieve a set of new productive keywords. Term Frequency-Inverse Document Frequency weighting (TF-IDF and Word Article Matrix (WAM are used as main methods. A modification of WAM is recomputed until it becomes the most effective model for social media text classification. The key success factor was enhancing our model with effective keywords from social media. A promising result of 99.50% accuracy was achieved, with more than 98.5% of Precision, Recall, and F-measure after updating the model three times.

  7. A Survey: Framework of an Information Retrieval for Malay Translated Hadith Document

    Directory of Open Access Journals (Sweden)

    Zulkefli Nurul Syeilla Syazhween

    2017-01-01

    Full Text Available This paper reviews and analyses the limitation of the existing method used in the IR process in retrieving Malay Translated Hadith documents related to the search request. Traditional Malay Translated Hadith retrieval system has not focused on semantic extraction from text. The bag-of-words representation ignores the conceptual similarity of information in the query text and documents, which produce unsatisfactory retrieval results. Therefore, a more efficient IR framework is needed. This paper claims that the significant information extraction and subject-related information are actually important because the clues from this information can be used to search and find the relevance document to a query. Also, unimportant information can be discarded to represent the document content. So, semantic understanding of query and document is necessary to improve the effectiveness and accuracy of retrieval results for this domain study. Therefore, advance research is needed and it will be experimented in the future work. It is hoped that it will help users to search and find information regarding to the Malay Translated Hadith document.

  8. 2011 Addendum to the SNL/NM SWEIS Supplemental Information Source Documents

    Energy Technology Data Exchange (ETDEWEB)

    Dimmick, Ross [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2014-12-01

    This document contains updates to the Supplemental Information Sandia National Laboratories/New Mexico Site-Wide Environmental Impact Statement Source Documents that were developed in 2010. In general, this addendum provides calendar year 2010 data, along with changes or additions to text in the original documents.

  9. Sharing and Adaptation of Educational Documents in E-Learning

    Directory of Open Access Journals (Sweden)

    Chekry Abderrahman

    2012-03-01

    Full Text Available Few documents can be reused among the huge number of the educational documents on the web. The exponential increase of these documents makes it almost impossible to search for relevant documents. In addition to this, e-learning is designed for public users who have different levels of knowledge and varied skills so they should be given a content that sees to their needs. This work is about adapting the content of learning with learners preferences, and give the teachers the ability to reuse a given content.

  10. Cuneiform Documents from Various Dutch Collections

    NARCIS (Netherlands)

    Boer, de R.; Dercksen, J.G.; Krispijn, Th.J.H.; J.G., Dercksen e.a.

    2013-01-01

    Publication of Sumerian and Akkadian cuneiform texts in private collections from various periods: * Presargonic: letter-unknown provenance in Northern Babylonia * Ur III: administrative document-Umma) * Old Assyrian: letters-Kaneš (Anatolia) * Old Babylonian: lexical series Ugumu-unknown

  11. Terminology extraction from medical texts in Polish.

    Science.gov (United States)

    Marciniak, Małgorzata; Mykowiecka, Agnieszka

    2014-01-01

    Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need information on the phrases we are looking for. At the moment, clinical Polish resources are sparse. The existing terminologies, such as Polish Medical Subject Headings (MeSH), do not provide sufficient coverage for clinical tasks. It would be helpful therefore if it were possible to automatically prepare, on the basis of a data sample, an initial set of terms which, after manual verification, could be used for the purpose of information extraction. Using a combination of linguistic and statistical methods for processing over 1200 children hospital discharge records, we obtained a list of single and multiword terms used in hospital discharge documents written in Polish. The phrases are ordered according to their presumed importance in domain texts measured by the frequency of use of a phrase and the variety of its contexts. The evaluation showed that the automatically identified phrases cover about 84% of terms in domain texts. At the top of the ranked list, only 4% out of 400 terms were incorrect while out of the final 200, 20% of expressions were either not domain related or syntactically incorrect. We also observed that 70% of the obtained terms are not included in the Polish MeSH. Automatic terminology extraction can give results which are of a quality high enough to be taken as a starting point for building domain related terminological dictionaries or ontologies. This approach can be useful for preparing terminological resources for very specific subdomains for which no relevant terminologies already exist. The evaluation performed showed that none of the tested ranking procedures were

  12. PHOTOGRAPHY AS DOCUMENT: OTLET AND BRIET’S CONSIDERATIONS

    Directory of Open Access Journals (Sweden)

    Izângela Maria Sansoni Tonello

    2018-04-01

    Full Text Available Introduction: The amount and variety of information that are conveyed in different media and means incite a concern, especially in relation to photographic documents, since they are currently the focus of interest of the Information Science field. In this context, this paper emphasizes the role of photographs as sources of information capable of generating knowledge as well as an important aid for research in different areas. Objective: The main goal of this study was to research the concepts and definitions underpinning the photograph as a document in information units. Methodology: Bibliographic and documentary research. Results: It can be affirmed through the meanings about the term document discussed in the literature by the researched authors that the photograph corresponds to the assumptions necessary to substantiate document and photograph in photographic document. Conclusions: It is understood that this study clarifies some issues related to photograph as a document; however, this proposition raises reflections about the importance of the production context as well as its essential relationship with other documents, so that it is indisputably consolidated as a photographic document.

  13. Layout-aware text extraction from full-text PDF of scientific articles

    Directory of Open Access Journals (Sweden)

    Ramakrishnan Cartic

    2012-05-01

    Full Text Available Abstract Background The Portable Document Format (PDF is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the ‘Layout-Aware PDF Text Extraction’ (LA-PDFText system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Results Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1 Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2 Classifying text blocks into rhetorical categories using a rule-based method and (3 Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF

  14. Identifying issue frames in text.

    Directory of Open Access Journals (Sweden)

    Eyal Sagi

    Full Text Available Framing, the effect of context on cognitive processes, is a prominent topic of research in psychology and public opinion research. Research on framing has traditionally relied on controlled experiments and manually annotated document collections. In this paper we present a method that allows for quantifying the relative strengths of competing linguistic frames based on corpus analysis. This method requires little human intervention and can therefore be efficiently applied to large bodies of text. We demonstrate its effectiveness by tracking changes in the framing of terror over time and comparing the framing of abortion by Democrats and Republicans in the U.S.

  15. The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations; Texte des Accords de Cooperation Conclus entre L'Agence et des Organisations Intergouvernementales Regionales

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1961-02-07

    The texts of the Agency's agreements for co-operation with the regional inter-governmental organizations listed below, together with the respective protocols authenticating them, are reproduced in this document in the order in which the agreements entered into force, for the information of all Members of the Agency [French] Le present document reproduit le texte des accords de cooperation que l'Agence a conclus avec les organisations intergouvernementales regionales enumerees ci-apres, ainsi que celui des protocoles validant lesdits accords. Le texte de ces instruments, classes dans l'ordre chronologique de leur entree en vigueur, est communique, pour information, a tous les Membres de l'Agence.

  16. Discrepancies in Communication Versus Documentation of Weight-Management Benchmarks

    Directory of Open Access Journals (Sweden)

    Christy B. Turer MD, MHS

    2017-02-01

    Full Text Available To examine gaps in communication versus documentation of weight-management clinical practices, communication was recorded during primary care visits with 6- to 12-year-old overweight/obese Latino children. Communication/documentation content was coded by 3 reviewers using communication transcripts and health-record documentation. Discrepancies in communication/documentation content codes were resolved through consensus. Bivariate/multivariable analyses examined factors associated with discrepancies in benchmark communication/documentation. Benchmarks were neither communicated nor documented in up to 42% of visits, and communicated but not documented or documented but not communicated in up to 20% of visits. Lowest benchmark performance rates were for laboratory studies (35% and nutrition/weight-management referrals (42%. In multivariable analysis, overweight (vs obesity was associated with 1.6 more discrepancies in communication versus documentation (P = .03. Many weight-management benchmarks are not met, not documented, or performed without being communicated. Enhanced communication with families and documentation in health records may promote lifestyle changes in overweight children and higher quality care for overweight children in primary care.

  17. Layout-aware text extraction from full-text PDF of scientific articles.

    Science.gov (United States)

    Ramakrishnan, Cartic; Patnia, Abhishek; Hovy, Eduard; Burns, Gully Apc

    2012-05-28

    The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications. Our paper describes the construction and performance of an open source system that extracts text blocks from PDF-formatted full-text research articles and classifies them into logical units based on rules that characterize specific sections. The LA-PDFText system focuses only on the textual content of the research articles and is meant as a baseline for further experiments into more advanced extraction methods that handle multi-modal content, such as images and graphs. The system works in a three-stage process: (1) Detecting contiguous text blocks using spatial layout processing to locate and identify blocks of contiguous text, (2) Classifying text blocks into rhetorical categories using a rule-based method and (3) Stitching classified text blocks together in the correct order resulting in the extraction of text from section-wise grouped blocks. We show that our system can identify text blocks and classify them into rhetorical categories with Precision1 = 0.96% Recall = 0.89% and F1 = 0.91%. We also present an evaluation of the accuracy of the block detection algorithm used in step 2. Additionally, we have compared the accuracy of the text extracted by LA-PDFText to the text from the Open Access subset of PubMed Central. We then compared this accuracy with that of the text extracted by the PDF2Text system, 2commonly used to extract text from PDF. Finally, we discuss preliminary error analysis for

  18. Input Files and Procedures for Analysis of SMA Hybrid Composite Beams in MSC.Nastran and ABAQUS

    Science.gov (United States)

    Turner, Travis L.; Patel, Hemant D.

    2005-01-01

    A thermoelastic constitutive model for shape memory alloys (SMAs) and SMA hybrid composites (SMAHCs) was recently implemented in the commercial codes MSC.Nastran and ABAQUS. The model is implemented and supported within the core of the commercial codes, so no user subroutines or external calculations are necessary. The model and resulting structural analysis has been previously demonstrated and experimentally verified for thermoelastic, vibration and acoustic, and structural shape control applications. The commercial implementations are described in related documents cited in the references, where various results are also shown that validate the commercial implementations relative to a research code. This paper is a companion to those documents in that it provides additional detail on the actual input files and solution procedures and serves as a repository for ASCII text versions of the input files necessary for duplication of the available results.

  19. Aiding the Interpretation of Ancient Documents

    DEFF Research Database (Denmark)

    Roued-Cunliffe, Henriette

    How can Decision Support System (DSS) software aid the interpretation process involved in the reading of ancient documents? This paper discusses the development of a DSS prototype for the reading of ancient texts. In this context the term ‘ancient documents’ is used to describe mainly Greek...... tool it is important first to comprehend the interpretation process involved in reading ancient documents. This is not a linear process but rather a recursive process where the scholar moves between different levels of reading, such as ‘understanding the meaning of a character’ or ‘understanding...

  20. Stamp Detection in Color Document Images

    DEFF Research Database (Denmark)

    Micenkova, Barbora; van Beusekom, Joost

    2011-01-01

    , moreover, it can be imprinted with a variable quality and rotation. Previous methods were restricted to detection of stamps of particular shapes or colors. The method presented in the paper includes segmentation of the image by color clustering and subsequent classification of candidate solutions...... by geometrical and color-related features. The approach allows for differentiation of stamps from other color objects in the document such as logos or texts. For the purpose of evaluation, a data set of 400 document images has been collected, annotated and made public. With the proposed method, recall of 83...

  1. Where are the Search Engines for Handwritten Documents?

    NARCIS (Netherlands)

    van der Zant, Tijn; Schomaker, Lambert; Zinger, Svitlana; van Schie, Henny

    Although the problems of optical character recognition for contemporary printed text have been resolved, for historical printed and handwritten connected cursive text (i.e. western style writing), they have not. This does not mean that scanning historical documents is not useful. This article

  2. Where are the search engines for handwritten documents?

    NARCIS (Netherlands)

    Zant, T.; Schomaker, L.; Zinger, S.; Schie, H.

    2009-01-01

    Although the problems of optical character recognition for contemporary printed text have been resolved, for historical printed and handwritten connected cursive text (i.e. western style writing), they have not. This does not mean that scanning historical documents is not useful. This article

  3. Software System for Vocal Rendering of Printed Documents

    Directory of Open Access Journals (Sweden)

    Marian DARDALA

    2008-01-01

    Full Text Available The objective of this paper is to present a software system architecture developed to render the printed documents in a vocal form. On the other hand, in the paper are described the software solutions that exist as software components and are necessary for documents processing as well as for multimedia device controlling used by the system. The usefulness of this system is for people with visual disabilities that can access the contents of documents without that they be printed in Braille system or to exist in an audio form.

  4. Verifying the integrity of hardcopy document using OCR

    CSIR Research Space (South Africa)

    Mthethwa, Sthembile

    2018-03-01

    Full Text Available stream_source_info Mthethwa_20042_2018.pdf.txt stream_content_type text/plain stream_size 7349 Content-Encoding UTF-8 stream_name Mthethwa_20042_2018.pdf.txt Content-Type text/plain; charset=UTF-8 Verifying the Integrity...) of the document to be defined. Each text in the meta-template is labelled with a unique identifier, which makes it easier for the process of validation. The meta-template consist of two types of text; normal text and validation text (important text that must...

  5. OFFICIAL DOCUMENTS RELATING TO PORTUGUESE LANGUAGE TEACHING, INTERCULTURALITY AND LITERACY POLICY

    Directory of Open Access Journals (Sweden)

    Cloris Porto Torquato

    2016-06-01

    Full Text Available The present article analyzes two documents Parâmetros Curriculares Nacionais – Língua Portuguesa (BRASIL, 1998 and Parâmetros Curriculares Nacionais – Temas Transversais – Pluralidade Cultural (BRASIL, 1998b, conceiving these documents as constituents of language policies (RICENTO, 2006; SHOHAMY, 2006 and literacy policies, and it focuses the intercultural dialogues/conflicts that these documents promote when guiding that the teaching of the language should have as main object the text and indicating which genres should be privileged. Thereby, the text deals with language policies, more specifically focusing in literacy policies (bringing to bear the concept of literacy formulated by the New Literacy Studies (STREET, 1984, 1993, 2003; BARTON; HAMILTON, 1998; SIGNORINI, 2001 and interculturality (JANZEN, 2005. The analysis of the documents is undertaken to the light of the bakhtinian conception of language and it mobilizes the following concepts of the Circle of Bakhtin: dialogism, utterance and genres of speech. Furthermore, this text is based methodologically on the orientations of the authors of this Circle for the study of the language (BAKHTIN/ VOLOSHINOV, 1986; BAKHTIN, 2003. The analysis indicates that the official documents, when promoting literacy policies, also promote intercultural conflicts, because they privilege the dominant literacies, silencing other literacy practices. We understood that this silencing and invalidating local literacy practices has implications for the constitutions of the students’ identities and local language policies.

  6. Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic

    Directory of Open Access Journals (Sweden)

    Fawaz S. Al-Anzi

    2017-04-01

    Full Text Available Cosine similarity is one of the most popular distance measures in text classification problems. In this paper, we used this important measure to investigate the performance of Arabic language text classification. For textual features, vector space model (VSM is generally used as a model to represent textual information as numerical vectors. However, Latent Semantic Indexing (LSI is a better textual representation technique as it maintains semantic information between the words. Hence, we used the singular value decomposition (SVD method to extract textual features based on LSI. In our experiments, we conducted comparison between some of the well-known classification methods such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We used a corpus that contains 4,000 documents of ten topics (400 document for each topic. The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we used Term Frequency.Inverse Document Frequency (TF.IDF. This study reveals that the classification methods that use LSI features significantly outperform the TF.IDF-based methods. It also reveals that k-Nearest Neighbors (based on cosine measure and support vector machine are the best performing classifiers.

  7. Does pedagogical documentation support maternal reminiscing conversations?

    Directory of Open Access Journals (Sweden)

    Bethany Fleck

    2015-12-01

    Full Text Available When parents talk with their children about lessons learned in school, they are participating in reminiscing of an unshared event. This study sought to understand if pedagogical documentation, from the Reggio Approach to early childhood education, would support and enhance the conversation. Mother–child dyads reminisced two separate times about preschool lessons, one time with documentation available to them and one time without. Transcripts were coded extracting variables indicative of high and low maternal reminiscing styles. Results indicate that mother and child conversation characteristics were more highly elaborative when documentation was present than when it was not. In addition, children added more information to the conversation supporting the notion that such conversations enhanced memory for lessons. Documentation could be used as a support tool for conversations and children’s memory about lessons learned in school.

  8. Documentation of Accounting Records in Light of Legislative Innovations

    Directory of Open Access Journals (Sweden)

    K. V. BEZVERKHIY

    2017-05-01

    Full Text Available Legislative reforms in accounting aim to simplify accounting records and compilation of financial reports by business entities, thus increasing the position of Ukraine in the global ranking of Doing Business. This simplification is implied in the changes in the Regulation on Documentation of Accounting Records, entered into force to the Resolution of the Ukrainian Ministry of Finance. The objective of the study is to analyze the legislative innovations involved. The review of changes in documentation of accounting records is made. A comparative analysis of changes in the Regulation on Documentation of Accounting Records is made by sections: 1 General; 2 Primary documents; 3 Accounting records; 4 Correction of errors in primary documents and accounting records; 5 Organization of document circulation; 6 Storage of documents. Methods of analysis and synthesis are used for separating the differences in the editions of the Regulation on Documentation of Accounting Records. The result of the study has theoretical and practical value for the domestic business enterprise sector.

  9. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  10. Wilmar joint market model, Documentation

    International Nuclear Information System (INIS)

    Meibom, P.; Larsen, Helge V.; Barth, R.; Brand, H.; Weber, C.; Voll, O.

    2006-01-01

    The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)

  11. Wilmar joint market model, Documentation

    Energy Technology Data Exchange (ETDEWEB)

    Meibom, P.; Larsen, Helge V. [Risoe National Lab. (Denmark); Barth, R.; Brand, H. [IER, Univ. of Stuttgart (Germany); Weber, C.; Voll, O. [Univ. of Duisburg-Essen (Germany)

    2006-01-15

    The Wilmar Planning Tool is developed in the project Wind Power Integration in Liberalised Electricity Markets (WILMAR) supported by EU (Contract No. ENK5-CT-2002-00663). A User Shell implemented in an Excel workbook controls the Wilmar Planning Tool. All data are contained in Access databases that communicate with various sub-models through text files that are exported from or imported to the databases. The Joint Market Model (JMM) constitutes one of these sub-models. This report documents the Joint Market model (JMM). The documentation describes: 1. The file structure of the JMM. 2. The sets, parameters and variables in the JMM. 3. The equations in the JMM. 4. The looping structure in the JMM. (au)

  12. Pedoinformatics Approach to Soil Text Analytics

    Science.gov (United States)

    Furey, J.; Seiter, J.; Davis, A.

    2017-12-01

    The several extant schema for the classification of soils rely on differing criteria, but the major soil science taxonomies, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources systems, are based principally on inferred pedogenic properties. These taxonomies largely result from compiled individual observations of soil morphologies within soil profiles, and the vast majority of this pedologic information is contained in qualitative text descriptions. We present text mining analyses of hundreds of gigabytes of parsed text and other data in the digitally available USDA soil taxonomy documentation, the Soil Survey Geographic (SSURGO) database, and the National Cooperative Soil Survey (NCSS) soil characterization database. These analyses implemented iPython calls to Gensim modules for topic modelling, with latent semantic indexing completed down to the lowest taxon level (soil series) paragraphs. Via a custom extension of the Natural Language Toolkit (NLTK), approximately one percent of the USDA soil series descriptions were used to train a classifier for the remainder of the documents, essentially by treating soil science words as comprising a novel language. While location-specific descriptors at the soil series level are amenable to geomatics methods, unsupervised clustering of the occurrence of other soil science words did not closely follow the usual hierarchy of soil taxa. We present preliminary phrasal analyses that may account for some of these effects.

  13. An Experimental Text in Transformational Geometry, Student Text; Cambridge Conference on School Mathematics Feasibility Study No. 43a.

    Science.gov (United States)

    Cambridge Conference on School Mathematics, Newton, MA.

    This is part of a student text which was written with the aim of reflecting the thinking of The Cambridge Conference on School Mathematics (CCSM) regarding the goals and objectives for mathematics. The instructional materials were developed for teaching geometry in the secondary schools. This document is chapter six and titled Motions and…

  14. Classification of forensic autopsy reports through conceptual graph-based document representation model.

    Science.gov (United States)

    Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa; Al-Garadi, Mohammed Ali

    2018-06-01

    Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results

  15. Supporting the education evidence portal via text mining

    Science.gov (United States)

    Ananiadou, Sophia; Thompson, Paul; Thomas, James; Mu, Tingting; Oliver, Sandy; Rickinson, Mark; Sasaki, Yutaka; Weissenbacher, Davy; McNaught, John

    2010-01-01

    The UK Education Evidence Portal (eep) provides a single, searchable, point of access to the contents of the websites of 33 organizations relating to education, with the aim of revolutionizing work practices for the education community. Use of the portal alleviates the need to spend time searching multiple resources to find relevant information. However, the combined content of the websites of interest is still very large (over 500 000 documents and growing). This means that searches using the portal can produce very large numbers of hits. As users often have limited time, they would benefit from enhanced methods of performing searches and viewing results, allowing them to drill down to information of interest more efficiently, without having to sift through potentially long lists of irrelevant documents. The Joint Information Systems Committee (JISC)-funded ASSIST project has produced a prototype web interface to demonstrate the applicability of integrating a number of text-mining tools and methods into the eep, to facilitate an enhanced searching, browsing and document-viewing experience. New features include automatic classification of documents according to a taxonomy, automatic clustering of search results according to similar document content, and automatic identification and highlighting of key terms within documents. PMID:20643679

  16. COMPOSITIONAL AND SUBSTANTIAL STRUCTURE OF THE MEDICAL DOCUMENT: FORMATION STAGES

    Directory of Open Access Journals (Sweden)

    Romashova Olga Vladimirovna

    2015-03-01

    Full Text Available The article deals with the compositional and substantial structure of the ambulatory medical record, or "case history", which has being formed for a long time. The author allocates the three main periods in the formation of this medical document: the first period (the beginning of the 19th century – 1920s is connected with the origin and formation; the second period (1920-1980s is marked by emergence of the normative legal acts regulating registration and maintaining; the third period (1980s – up to the present is associated with the cancellation of regulations and the introduction of the new order of the Ministry of Health of the USSR that changed the document's form and name. It is determined that the composition of the case history consists of the title page and the main part. The following processes take place in the course of ambulatory medical record's formation: strengthening formalization, increase in the number of pattern text fragments, increase in the text's volume, and the implementation of bigger number of functions. The author reveals the main (informative and cumulative, accounting and additional (scientific, controlling, legal, financial functions of the document. The implementation of these functions is reflected in the compositional and substantial structure of the document text and is conditioned by a number of extralinguistic factors.

  17. Development of digital library system on regulatory documents for nuclear power plants

    International Nuclear Information System (INIS)

    Lee, K. H.; Kim, K. J.; Yoon, Y. H.; Kim, M. W.; Lee, J. I.

    2001-01-01

    The main objective of this study is to establish nuclear regulatory document retrieval system based on internet. With the advancement of internet and information processing technology, information management patterns are going through a new paradigm. Getting along the current of the time, it is general tendency to transfer paper-type documents into electronic-type documents through document scanning and indexing. This system consists of nuclear regulatory documents, nuclear safety documents, digital library, and information system with index and full text

  18. Using Electronic Systems for Document Management in Economic Entities

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available Document workflow and management, be them scanned documents, computer-generated e-documents or complex file formats, are critical elements for the success of an organization. Delivering the correct information to the right person, at the right moment is a fundamental element of daily activity. In the Internet era, documents have a new format; and what is more important: completely new functions. Paper is replaced by electronic formats such as .html, .xms, .pdf or .doc. The price for this progress is the increasing technological complexity, and with this complexity comes the need for more efficient techniques of management and organization such as a document management electronic system. This paper aims to present document management not as a separate software category on the IT market, but as an element integrated with any software solution, maximizing its capacity of making business more efficient.

  19. Features based approach for indexation and representation of unstructured Arabic documents

    Directory of Open Access Journals (Sweden)

    Mohamed Salim El Bazzi

    2017-06-01

    Full Text Available The increase of textual information published in Arabic language on the internet, public libraries and administrations requires implementing effective techniques for the extraction of relevant information contained in large corpus of texts. The purpose of indexing is to create a document representation that easily find and identify the relevant information in a set of documents. However, mining textual data is becoming a complicated task, especially when taking semantic into consideration. In this paper, we will present an indexation system based on contextual representation that will take the advantage of semantic links given in a document. Our approach is based on the extraction of keyphrases. Then, each document is represented by its relevant keyphrases instead of its simple keywords. The experimental results confirms the effectiveness of our approach.

  20. University of Virginia open-quotes virtualclose quotes reactor facility tours

    International Nuclear Information System (INIS)

    Krause, D.R.; Mulder, R.U.

    1995-01-01

    An electronic information and tour book has been constructed for the University of Virginia reactor (UVAR) facility. Utilizing the global Internet, the document resides on the University of Virginia World Wide Web (WWW or W) server within the UVAR Homepage at http://www.virginia. edu/∼reactor/. It is quickly accessible wherever an Internet connection exists. The UVAR Homepage files are accessed with the hypertext transfer protocol (http) prefix. The files are written in hypertext markup language (HTML), a very simple method of preparing ASCII text for W3 presentation. The HTML allows use of various hierarchies of headers, indentation, fonts, and the linking of words and/or pictures to other addresses-uniform resource locators. The linking of texts, pictures, sounds, and server addresses is known as hypermedia

  1. Privacy Preserving Similarity Based Text Retrieval through Blind Storage

    Directory of Open Access Journals (Sweden)

    Pinki Kumari

    2016-09-01

    Full Text Available Cloud computing is improving rapidly due to their more advantage and more data owners give interest to outsource their data into cloud storage for centralize their data. As huge files stored in the cloud storage, there is need to implement the keyword based search process to data user. At the same time to protect the privacy of data, encryption techniques are used for sensitive data, that encryption is done before outsourcing data to cloud server. But it is critical to search results in encryption data. In this system we propose similarity text retrieval from the blind storage blocks with encryption format. This system provides more security because of blind storage system. In blind storage system data is stored randomly on cloud storage.  In Existing Data Owner cannot encrypt the document data as it was done only at server end. Everyone can access the data as there was no private key concept applied to maintained privacy of the data. But In our proposed system, Data Owner can encrypt the data himself using RSA algorithm.  RSA is a public key-cryptosystem and it is widely used for sensitive data storage over Internet. In our system we use Text mining process for identifying the index files of user documents. Before encryption we also use NLP (Nature Language Processing technique to identify the keyword synonyms of data owner document. Here text mining process examines text word by word and collect literal meaning beyond the words group that composes the sentence. Those words are examined in API of word net so that only equivalent words can be identified for index file use. Our proposed system provides more secure and authorized way of recover the text in cloud storage with access control. Finally, our experimental result shows that our system is better than existing.

  2. A New Binarization Algorithm for Historical Documents

    Directory of Open Access Journals (Sweden)

    Marcos Almeida

    2018-01-01

    Full Text Available Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwriting, printing process, digitalization process, etc. are some of the factors that affect binarization. This article presents a new binarization algorithm for historical documents. The new global filter proposed is performed in four steps: filtering the image using a bilateral filter, splitting image into the RGB components, decision-making for each RGB channel based on an adaptive binarization method inspired by Otsu’s method with a choice of the threshold level, and classification of the binarized images to decide which of the RGB components best preserved the document information in the foreground. The quantitative and qualitative assessment made with 23 binarization algorithms in three sets of “real world” documents showed very good results.

  3. A methodology for semiautomatic taxonomy of concepts extraction from nuclear scientific documents using text mining techniques; Metodologia para extracao semiautomatica de uma taxonomia de conceitos a partir da producao cientifica da area nuclear utilizando tecnicas de mineracao de textos

    Energy Technology Data Exchange (ETDEWEB)

    Braga, Fabiane dos Reis

    2013-07-01

    This thesis presents a text mining method for semi-automatic extraction of taxonomy of concepts, from a textual corpus composed of scientific papers related to nuclear area. The text classification is a natural human practice and a crucial task for work with large repositories. The document clustering technique provides a logical and understandable framework that facilitates the organization, browsing and searching. Most clustering algorithms using the bag of words model to represent the content of a document. This model generates a high dimensionality of the data, ignores the fact that different words can have the same meaning and does not consider the relationship between them, assuming that words are independent of each other. The methodology presents a combination of a model for document representation by concepts with a hierarchical document clustering method using frequency of co-occurrence concepts and a technique for clusters labeling more representatives, with the objective of producing a taxonomy of concepts which may reflect a structure of the knowledge domain. It is hoped that this work will contribute to the conceptual mapping of scientific production of nuclear area and thus support the management of research activities in this area. (author)

  4. Writing Treatment for Aphasia: A Texting Approach

    Science.gov (United States)

    Beeson, Pelagie M.; Higginson, Kristina; Rising, Kindle

    2013-01-01

    Purpose: Treatment studies have documented the therapeutic and functional value of lexical writing treatment for individuals with severe aphasia. The purpose of this study was to determine whether such retraining could be accomplished using the typing feature of a cellular telephone, with the ultimate goal of using text messaging for…

  5. Aspects of Text Mining From Computational Semiotics to Systemic Functional Hypertexts

    Directory of Open Access Journals (Sweden)

    Alexander Mehler

    2001-05-01

    Full Text Available The significance of natural language texts as the prime information structure for the management and dissemination of knowledge in organisations is still increasing. Making relevant documents available depending on varying tasks in different contexts is of primary importance for any efficient task completion. Implementing this demand requires the content based processing of texts, which enables to reconstruct or, if necessary, to explore the relationship of task, context and document. Text mining is a technology that is suitable for solving problems of this kind. In the following, semiotic aspects of text mining are investigated. Based on the primary object of text mining - natural language lexis - the specific complexity of this class of signs is outlined and requirements for the implementation of text mining procedures are derived. This is done with reference to text linkage introduced as a special task in text mining. Text linkage refers to the exploration of implicit, content based relations of texts (and their annotation as typed links in corpora possibly organised as hypertexts. In this context, the term systemic functional hypertext is introduced, which distinguishes genre and register layers for the management of links in a poly-level hypertext system.

  6. Dealing with extreme data diversity: extraction and fusion from the growing types of document formats

    Science.gov (United States)

    David, Peter; Hansen, Nichole; Nolan, James J.; Alcocer, Pedro

    2015-05-01

    The growth in text data available online is accompanied by a growth in the diversity of available documents. Corpora with extreme heterogeneity in terms of file formats, document organization, page layout, text style, and content are common. The absence of meaningful metadata describing the structure of online and open-source data leads to text extraction results that contain no information about document structure and are cluttered with page headers and footers, web navigation controls, advertisements, and other items that are typically considered noise. We describe an approach to document structure and metadata recovery that uses visual analysis of documents to infer the communicative intent of the author. Our algorithm identifies the components of documents such as titles, headings, and body content, based on their appearance. Because it operates on an image of a document, our technique can be applied to any type of document, including scanned images. Our approach to document structure recovery considers a finer-grained set of component types than prior approaches. In this initial work, we show that a machine learning approach to document structure recovery using a feature set based on the geometry and appearance of images of documents achieves a 60% greater F1- score than a baseline random classifier.

  7. MeSHmap: a text mining tool for MEDLINE.

    OpenAIRE

    Srinivasan, P.

    2001-01-01

    Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures et...

  8. 76 FR 14856 - Video Description: Implementation of the Twenty-First Century Communications and Video...

    Science.gov (United States)

    2011-03-18

    ... ECFS ( http://www.fcc.gov/cgb/ecfs/ ). Documents will be available electronically in ASCII, Word 97... afford better access to television programs for individuals who are blind or visually impaired, enabling...\\ Motion Picture Ass'n of America, Inc. v. Federal Communications Comm., 309 F.3d 796 (D.C. Cir. 2002). \\5...

  9. Sesame IO Library User Manual Version 8

    Energy Technology Data Exchange (ETDEWEB)

    Abhold, Hilary [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Young, Ginger Ann [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-05-15

    This document is a user manual for SES_IO, a low-level library for reading and writing sesame files. The purpose of the SES_IO library is to provide a simple user interface for accessing and creating sesame files that does not change across sesame format type (such as binary, ascii, and xml).

  10. Death as Insight into Life: Adolescents' Gothic Text Encounters

    Science.gov (United States)

    Del Nero, Jennifer

    2017-01-01

    This qualitative case study explores adolescents' responses to texts containing death and destruction, a seminal trope of the Gothic literary genre. Participants read both classic and popular culture texts featuring characters grappling with death in their seventh grade reading classroom. Observations, interviews, and documents were collected and…

  11. Rational kernels for Arabic Root Extraction and Text Classification

    Directory of Open Access Journals (Sweden)

    Attia Nehar

    2016-04-01

    Full Text Available In this paper, we address the problems of Arabic Text Classification and root extraction using transducers and rational kernels. We introduce a new root extraction approach on the basis of the use of Arabic patterns (Pattern Based Stemmer. Transducers are used to model these patterns and root extraction is done without relying on any dictionary. Using transducers for extracting roots, documents are transformed into finite state transducers. This document representation allows us to use and explore rational kernels as a framework for Arabic Text Classification. Root extraction experiments are conducted on three word collections and yield 75.6% of accuracy. Classification experiments are done on the Saudi Press Agency dataset and N-gram kernels are tested with different values of N. Accuracy and F1 report 90.79% and 62.93% respectively. These results show that our approach, when compared with other approaches, is promising specially in terms of accuracy and F1.

  12. Overview of Historical Earthquake Document Database in Japan and Future Development

    Science.gov (United States)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami

  13. DOCUMENT REPRESENTATION FOR CLUSTERING OF SCIENTIFIC ABSTRACTS

    Directory of Open Access Journals (Sweden)

    S. V. Popova

    2014-01-01

    Full Text Available The key issue of the present paper is clustering of narrow-domain short texts, such as scientific abstracts. The work is based on the observations made when improving the performance of key phrase extraction algorithm. An extended stop-words list was used that was built automatically for the purposes of key phrase extraction and gave the possibility for a considerable quality enhancement of the phrases extracted from scientific publications. A description of the stop- words list creation procedure is given. The main objective is to investigate the possibilities to increase the performance and/or speed of clustering by the above-mentioned list of stop-words as well as information about lexeme parts of speech. In the latter case a vocabulary is applied for the document representation, which contains not all the words that occurred in the collection, but only nouns and adjectives or their sequences encountered in the documents. Two base clustering algorithms are applied: k-means and hierarchical clustering (average agglomerative method. The results show that the use of an extended stop-words list and adjective-noun document representation makes it possible to improve the performance and speed of k-means clustering. In a similar case for average agglomerative method a decline in performance quality may be observed. It is shown that the use of adjective-noun sequences for document representation lowers the clustering quality for both algorithms and can be justified only when a considerable reduction of feature space dimensionality is necessary.

  14. A document preparation system in a large network environment

    Energy Technology Data Exchange (ETDEWEB)

    Vigil, M.; Bouchier, S.; Sanders, C.; Sydoriak, S.; Wheeler, K.

    1988-01-01

    At Los Alamos National Laboratory, we have developed an integrated document preparation system that produces publication-quality documents. This system combines text formatters and computer graphics capabilities that have been adapted to meet the needs of users in a large scientific research laboratory. This paper describes the integration of document processing technology to develop a system architecture, based on a page description language, to provide network-wide capabilities in a distributed computing environment. We describe the Laboratory requirements, the integration and implementation issues, and the challenges we faced developing this system.

  15. DOES PRESENTING PATIENT'S BMI INCREASE DOCUMENTATION OF OBESITY?

    Directory of Open Access Journals (Sweden)

    Norm Clothier, MD, M. Kim Marvel, PhD, Courtney S. Cruickshank, MS

    2002-09-01

    Full Text Available Purpose: Despite the associated health consequences, obesity is infrequently documented as a problem in medical charts. The purpose of this study is to determine whether a simple intervention (routine listing of the BMI on the medical chart will increase physician documentation of obesity in the medical record. Methods: Participants were resident physicians in a family medicine residency program. Participants were randomly assigned to either an experimental group or a control group. For experimental group physicians, the Body Mass Index was listed alongside other vital signs of patients seen in an ambulatory setting. Physician documentation of patient obesity was assessed by chart review after patient visits. Documentation was defined as inclusion of obesity on the problem list or in the progress note. Results: The intervention did not significantly increase the rate of documentation of obesity in the medical chart. Several reasons for the lack of change are explored, including the difficulty of treating obesity successfully.

  16. Information Retrieval and Text Mining Technologies for Chemistry.

    Science.gov (United States)

    Krallinger, Martin; Rabal, Obdulia; Lourenço, Anália; Oyarzabal, Julen; Valencia, Alfonso

    2017-06-28

    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.

  17. A Proposed Arabic Handwritten Text Normalization Method

    Directory of Open Access Journals (Sweden)

    Tarik Abu-Ain

    2014-11-01

    Full Text Available Text normalization is an important technique in document image analysis and recognition. It consists of many preprocessing stages, which include slope correction, text padding, skew correction, and straight the writing line. In this side, text normalization has an important role in many procedures such as text segmentation, feature extraction and characters recognition. In the present article, a new method for text baseline detection, straightening, and slant correction for Arabic handwritten texts is proposed. The method comprises a set of sequential steps: first components segmentation is done followed by components text thinning; then, the direction features of the skeletons are extracted, and the candidate baseline regions are determined. After that, selection of the correct baseline region is done, and finally, the baselines of all components are aligned with the writing line.  The experiments are conducted on IFN/ENIT benchmark Arabic dataset. The results show that the proposed method has a promising and encouraging performance.

  18. Electronic Braille Document Reader

    OpenAIRE

    Arif, Shahab; Holmes, Violeta

    2013-01-01

    This paper presents an investigation into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blin...

  19. Electronic Braille Document Reader

    OpenAIRE

    Arif, S.

    2012-01-01

    An investigation was conducted into developing a portable Braille device which would allow visually impaired individuals to read electronic documents by actuating Braille text on a finger. Braille books tend to be bulky in size due to the minimum size requirements for each Braille cell. E-books can be read in Braille using refreshable Braille displays connected to a computer. However, the refreshable Braille displays are expensive, bulky and are not portable. These factors restrict blind and ...

  20. Interconnectedness und digitale Texte

    Directory of Open Access Journals (Sweden)

    Detlev Doherr

    2013-04-01

    Full Text Available Zusammenfassung Die multimedialen Informationsdienste im Internet werden immer umfangreicher und umfassender, wobei auch die nur in gedruckter Form vorliegenden Dokumente von den Bibliotheken digitalisiert und ins Netz gestellt werden. Über Online-Dokumentenverwaltungen oder Suchmaschinen können diese Dokumente gefunden und dann in gängigen Formaten wie z.B. PDF bereitgestellt werden. Dieser Artikel beleuchtet die Funktionsweise der Humboldt Digital Library, die seit mehr als zehn Jahren Dokumente von Alexander von Humboldt in englischer Übersetzung im Web als HDL (Humboldt Digital Library kostenfrei zur Verfügung stellt. Anders als eine digitale Bibliothek werden dabei allerdings nicht nur digitalisierte Dokumente als Scan oder PDF bereitgestellt, sondern der Text als solcher und in vernetzter Form verfügbar gemacht. Das System gleicht damit eher einem Informationssystem als einer digitalen Bibliothek, was sich auch in den verfügbaren Funktionen zur Auffindung von Texten in unterschiedlichen Versionen und Übersetzungen, Vergleichen von Absätzen verschiedener Dokumente oder der Darstellung von Bilden in ihrem Kontext widerspiegelt. Die Entwicklung von dynamischen Hyperlinks auf der Basis der einzelnen Textabsätze der Humboldt‘schen Werke in Form von Media Assets ermöglicht eine Nutzung der Programmierschnittstelle von Google Maps zur geographischen wie auch textinhaltlichen Navigation. Über den Service einer digitalen Bibliothek hinausgehend, bietet die HDL den Prototypen eines mehrdimensionalen Informationssystems, das mit dynamischen Strukturen arbeitet und umfangreiche thematische Auswertungen und Vergleiche ermöglicht. Summary The multimedia information services on Internet are becoming more and more comprehensive, even the printed documents are digitized and republished as digital Web documents by the libraries. Those digital files can be found by search engines or management tools and provided as files in usual formats as

  1. Technical document characterization by data analysis

    International Nuclear Information System (INIS)

    Mauget, A.

    1993-05-01

    Nuclear power plants possess documents analyzing all the plant systems, which represents a vast quantity of paper. Analysis of textual data can enable a document to be classified by grouping the texts containing the same words. These methods are used on system manuals for feasibility studies. The system manual is then analyzed by LEXTER and the terms it has selected are examined. We first classify according to style (sentences containing general words, technical sentences, etc.), and then according to terms. However, it will not be possible to continue in this fashion for the 100 system manuals existing, because of lack of sufficient storage capacity. Another solution is being developed. (author)

  2. Learning High-Order Filters for Efficient Blind Deconvolution of Document Photographs

    KAUST Repository

    Xiao, Lei

    2016-09-16

    Photographs of text documents taken by hand-held cameras can be easily degraded by camera motion during exposure. In this paper, we propose a new method for blind deconvolution of document images. Observing that document images are usually dominated by small-scale high-order structures, we propose to learn a multi-scale, interleaved cascade of shrinkage fields model, which contains a series of high-order filters to facilitate joint recovery of blur kernel and latent image. With extensive experiments, we show that our method produces high quality results and is highly efficient at the same time, making it a practical choice for deblurring high resolution text images captured by modern mobile devices. © Springer International Publishing AG 2016.

  3. Conservation Documentation and the Implications of Digitisation

    Directory of Open Access Journals (Sweden)

    Michelle Moore

    2001-11-01

    Full Text Available Conservation documentation can be defined as the textual and visual records collected during the care and treatment of an object. It can include records of the object's condition, any treatment done to the object, any observations or conclusions made by the conservator as well as details on the object's past and present environment. The form of documentation is not universally agreed upon nor has it always been considered an important aspect of the conservation profession. Good documentation tells the complete story of an object thus far and should provide as much information as possible for the future researcher, curator, or conservator. The conservation profession will benefit from digitising its documentation using software such as databases and hardware like digital cameras and scanners. Digital technology will make conservation documentation more easily accessible, cost/time efficient, and will increase consistency and accuracy of the recorded data, and reduce physical storage space requirements. The major drawback to digitising conservation records is maintaining access to the information for the future; the notorious pace of technological change has serious implications for retrieving data from any machine- readable medium.

  4. Binarization and Segmentation Framework for Sundanese Ancient Documents

    Directory of Open Access Journals (Sweden)

    Erick Paulus

    2017-11-01

    Full Text Available Binarization and segmentation process are two first important methods for optical character recognition system. For ancient document image which is written by human, binarization process remains a major challenge.In general, it is occurring because the image quality is badly degraded image and has various different noises in the non-text area.After binarization process, segmentation based on line is conducted in separate text-line from the others. We proposedanovel frameworkof binarization and segmentation process that enhance the performance of Niblackbinarization method and implementthe minimum of energy function to find the path of the separator line between two text-line.For experiments, we use the 22 images that come from the Sundanese ancient documents on Kropak 18 and Kropak22. The evaluation matrix show that our proposed binarization succeeded to improve F-measure 20%for Kropak 22 and 50% for Kropak 18 from original Niblack method.Then, we present the influence of various input images both true color and binary image to text-line segmentation. In line segmentation process, binarized image from our proposed framework can producethe number of line-text as same as the number of target lines. Overall, our proposed framework produce promised results so it can be used as input images for the next OCR process.

  5. Intelligent Bar Chart Plagiarism Detection in Documents

    Directory of Open Access Journals (Sweden)

    Mohammed Mumtaz Al-Dabbagh

    2014-01-01

    Full Text Available This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR. By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

  6. Using color management in color document processing

    Science.gov (United States)

    Nehab, Smadar

    1995-04-01

    Color Management Systems have been used for several years in Desktop Publishing (DTP) environments. While this development hasn't matured yet, we are already experiencing the next generation of the color imaging revolution-Device Independent Color for the small office/home office (SOHO) environment. Though there are still open technical issues with device independent color matching, they are not the focal point of this paper. This paper discusses two new and crucial aspects in using color management in color document processing: the management of color objects and their associated color rendering methods; a proposal for a precedence order and handshaking protocol among the various software components involved in color document processing. As color peripherals become affordable to the SOHO market, color management also becomes a prerequisite for common document authoring applications such as word processors. The first color management solutions were oriented towards DTP environments whose requirements were largely different. For example, DTP documents are image-centric, as opposed to SOHO documents that are text and charts centric. To achieve optimal reproduction on low-cost SOHO peripherals, it is critical that different color rendering methods are used for the different document object types. The first challenge in using color management of color document processing is the association of rendering methods with object types. As a result of an evolutionary process, color matching solutions are now available as application software, as driver embedded software and as operating system extensions. Consequently, document processing faces a new challenge, the correct selection of the color matching solution while avoiding duplicate color corrections.

  7. Automatic classification of journalistic documents on the Internet1

    Directory of Open Access Journals (Sweden)

    Elias OLIVEIRA

    Full Text Available Abstract Online journalism is increasing every day. There are many news agencies, newspapers, and magazines using digital publication in the global network. Documents published online are available to users, who use search engines to find them. In order to deliver documents that are relevant to the search, they must be indexed and classified. Due to the vast number of documents published online every day, a lot of research has been carried out to find ways to facilitate automatic document classification. The objective of the present study is to describe an experimental approach for the automatic classification of journalistic documents published on the Internet using the Vector Space Model for document representation. The model was tested based on a real journalism database, using algorithms that have been widely reported in the literature. This article also describes the metrics used to assess the performance of these algorithms and their required configurations. The results obtained show the efficiency of the method used and justify further research to find ways to facilitate the automatic classification of documents.

  8. Health physics source document for codes of practice

    International Nuclear Information System (INIS)

    Pearson, G.W.; Meggitt, G.C.

    1989-05-01

    Personnel preparing codes of practice often require basic Health Physics information or advice relating to radiological protection problems and this document is written primarily to supply such information. Certain technical terms used in the text are explained in the extensive glossary. Due to the pace of change in the field of radiological protection it is difficult to produce an up-to-date document. This document was compiled during 1988 however, and therefore contains the principle changes brought about by the introduction of the Ionising Radiations Regulations (1985). The paper covers the nature of ionising radiation, its biological effects and the principles of control. It is hoped that the document will provide a useful source of information for both codes of practice and wider areas and stimulate readers to study radiological protection issues in greater depth. (author)

  9. ARABIC TEXT CLASSIFICATION USING NEW STEMMER FOR FEATURE SELECTION AND DECISION TREES

    Directory of Open Access Journals (Sweden)

    SAID BAHASSINE

    2017-06-01

    Full Text Available Text classification is the process of assignment of unclassified text to appropriate classes based on their content. The most prevalent representation for text classification is the bag of words vector. In this representation, the words that appear in documents often have multiple morphological structures, grammatical forms. In most cases, this morphological variant of words belongs to the same category. In the first part of this paper, anew stemming algorithm was developed in which each term of a given document is represented by its root. In the second part, a comparative study is conducted of the impact of two stemming algorithms namely Khoja’s stemmer and our new stemmer (referred to hereafter by origin-stemmer on Arabic text classification. This investigation was carried out using chi-square as a feature of selection to reduce the dimensionality of the feature space and decision tree classifier. In order to evaluate the performance of the classifier, this study used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, Middle East, switch and world on WEKA toolkit. The recall, f-measure and precision measures are used to compare the performance of the obtained models. The experimental results show that text classification using rout stemmer outperforms classification using Khoja’s stemmer. The f-measure was 92.9% in sport category and 89.1% in business category.

  10. Text mining with R a tidy approach

    CERN Document Server

    Silge, Julia

    2017-01-01

    Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective. The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and visualize characteristics of text. You'll also learn how to integrate natural language processing (NLP) into effective workflows. Practical code examples and data explorations will help you generate real insights from literature, news, and social media. Learn how to apply the tidy text format to NLP Use sentiment analysis to mine the emotional content of text Identify a document's most important terms with frequency measurements E...

  11. Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw.

    Science.gov (United States)

    Görg, Carsten; Liu, Zhicheng; Kihm, Jaeyeon; Choo, Jaegul; Park, Haesun; Stasko, John

    2013-10-01

    Investigators across many disciplines and organizations must sift through large collections of text documents to understand and piece together information. Whether they are fighting crime, curing diseases, deciding what car to buy, or researching a new field, inevitably investigators will encounter text documents. Taking a visual analytics approach, we integrate multiple text analysis algorithms with a suite of interactive visualizations to provide a flexible and powerful environment that allows analysts to explore collections of documents while sensemaking. Our particular focus is on the process of integrating automated analyses with interactive visualizations in a smooth and fluid manner. We illustrate this integration through two example scenarios: an academic researcher examining InfoVis and VAST conference papers and a consumer exploring car reviews while pondering a purchase decision. Finally, we provide lessons learned toward the design and implementation of visual analytics systems for document exploration and understanding.

  12. BURT: back up and restore tool

    Energy Technology Data Exchange (ETDEWEB)

    Karonis, N.T.

    1994-11-01

    BURT is just one of the tools in the Experimental Physics Industrial Control System (EPICS). In this document we address the problem of backing up and restoring sets of values in databases whose values are continuously changing. In doing so, we present the Back Up and Restore Tool (BURT). In this presentation we provide a theoretical framework that defines the problem and lays the foundation for its solution. BURT is a tool designed and implemented with respect to that theoretical framework. It is not necessary for users of BURT to have an understanding of that framework. It was included in this document only for the purpose of completeness. BURT`s basic purpose is to back up sets of values so that they can be later restored. Each time a back up is requested, a new ASCII file is generated. Further, the data values are stored as ASCII strings and therefore not compressed. Both of these facts conspire against BURT as a candidate for an archiver. Users who need an archiver should use a different tool, the Archiver.

  13. Light Duty Utility Arm interface control document plan

    Energy Technology Data Exchange (ETDEWEB)

    Engstrom, J.W.

    1994-12-27

    This document describes the interface control documents that will be used to identify and control interface features throughout all phases of the Light Duty Utility Arm (LDUA) development and design. After the system is built, delivered and installed in the Cold Test Facility and later at the tank farm, the Interface Control Documents can be used in maintaining the configuration control process. The Interface Control Document will consist of Interface Control Drawings and a data base directly tied to the Interface Control Drawings. The data base can be used as an index to conveniently find interface information. Design drawings and other text documents that contain interface information will appear in the database. The Interface Control Drawings will be used to document and control the data and information that define the interface boundaries between systems, subsystems and equipment. Also, the interface boundaries will define the areas of responsibility for systems and subsystems. The drawing will delineate and identify all the physical and functional interfaces that required coordination to establish and maintain compatibility between the co-functioning equipment, computer software, and the tank farm facilities. An appendix contains the Engineering interface control database system riser manual.

  14. Light Duty Utility Arm interface control document plan

    International Nuclear Information System (INIS)

    Engstrom, J.W.

    1994-01-01

    This document describes the interface control documents that will be used to identify and control interface features throughout all phases of the Light Duty Utility Arm (LDUA) development and design. After the system is built, delivered and installed in the Cold Test Facility and later at the tank farm, the Interface Control Documents can be used in maintaining the configuration control process. The Interface Control Document will consist of Interface Control Drawings and a data base directly tied to the Interface Control Drawings. The data base can be used as an index to conveniently find interface information. Design drawings and other text documents that contain interface information will appear in the database. The Interface Control Drawings will be used to document and control the data and information that define the interface boundaries between systems, subsystems and equipment. Also, the interface boundaries will define the areas of responsibility for systems and subsystems. The drawing will delineate and identify all the physical and functional interfaces that required coordination to establish and maintain compatibility between the co-functioning equipment, computer software, and the tank farm facilities. An appendix contains the Engineering interface control database system riser manual

  15. Optimization of the Document Placement in the RFID Cabinet

    Directory of Open Access Journals (Sweden)

    Kiedrowicz Maciej

    2016-01-01

    Full Text Available The study is devoted to the issue of optimization of the document placement in a single RFID cabinet. It has been assumed that the optimization problem means the reduction of archivization time with respect to the information on all documents with RFID tags. Since the explicit form of the criterion function remains unknown, for the purpose of its approximation, the regression analysis method has been used. The method uses data from a computer simulation of the process of archiving data about documents. To solve the optimization problem, the modified gradient projection method has been used.

  16. Documentation of TRU biological transport model (BIOTRAN)

    Energy Technology Data Exchange (ETDEWEB)

    Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

    1980-01-01

    Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text.

  17. Documentation of TRU biological transport model (BIOTRAN)

    International Nuclear Information System (INIS)

    Gallegos, A.F.; Garcia, B.J.; Sutton, C.M.

    1980-01-01

    Inclusive of Appendices, this document describes the purpose, rationale, construction, and operation of a biological transport model (BIOTRAN). This model is used to predict the flow of transuranic elements (TRU) through specified plant and animal environments using biomass as a vector. The appendices are: (A) Flows of moisture, biomass, and TRU; (B) Intermediate variables affecting flows; (C) Mnemonic equivalents (code) for variables; (D) Variable library (code); (E) BIOTRAN code (Fortran); (F) Plants simulated; (G) BIOTRAN code documentation; (H) Operating instructions for BIOTRAN code. The main text is presented with a specific format which uses a minimum of space, yet is adequate for tracking most relationships from their first appearance to their formulation in the code. Because relationships are treated individually in this manner, and rely heavily on Appendix material for understanding, it is advised that the reader familiarize himself with these materials before proceeding with the main text

  18. INTERFERENCE IN THE SHORT TEXT OF BESAKIH TEMPLE

    Directory of Open Access Journals (Sweden)

    Ni Made Kajeng Martha Puspita

    2016-05-01

    Full Text Available The aim of this study is to analyze the four types of interferences; syntax, semantics, copula, and redundant found in “Besakih Temple” short text. The data were collected through library research with the necessary note-taking and documentation. The method used in analyzing this study is qualitative method. The result showed that interferences found in the text are covering linguistic aspects. It is furthermore called the negative transfer due to the result of contact with another language. The most common source of errors is lack of knowledge of the speaker about the language being used.

  19. ARCHITECTURE SOFTWARE SOLUTION TO SUPPORT AND DOCUMENT MANAGEMENT QUALITY SYSTEM

    Directory of Open Access Journals (Sweden)

    Milan Eric

    2010-12-01

    Full Text Available One of the basis of a series of standards JUS ISO 9000 is quality system documentation. An architecture of the quality system documentation depends on the complexity of business system. An establishment of an efficient management documentation of system of quality is of a great importance for the business system, as well as in the phase of introducing the quality system and in further stages of its improvement. The study describes the architecture and capability of software solutions to support and manage the quality system documentation in accordance with the requirements of standards ISO 9001:2001, ISO 14001:2005 HACCP etc.

  20. Using Literary Texts to Teach Grammar in Foreign Language Classroom

    Science.gov (United States)

    Atmaca, Hasan; Günday, Rifat

    2016-01-01

    Today, it is discussed that the use of literary texts in foreign language classroom as a course material isn't obligatory; but necessary due to the close relationship between language and literature. Although literary texts are accepted as authentic documents and do not have any purpose for language teaching, they are indispensable sources to be…

  1. Born Broken: Fonts and Information Loss in Legacy Digital Documents

    Directory of Open Access Journals (Sweden)

    Geoffrey Brown

    2011-03-01

    Full Text Available For millions of legacy documents, correct rendering depends upon resources such as fonts that are not generally embedded within the document structure. Yet there is a significant risk of information loss due to missing or incorrectly substituted fonts. Large document collections depend on thousands of unique fonts not available on a common desktop workstation, which typically has between 100 and 200 fonts. Silent substitution of fonts, performed by applications such as Microsoft Office, can yield poorly rendered documents. In this paper we use a collection of 230,000 Word documents to assess the difficulty of matching font requirements with a database of fonts. We describe the identifying information contained in common font formats, font requirements stored in Word documents, the API provided by Windows to support font requests by applications, the documented substitution algorithms used by Windows when requested fonts are not available, and the ways in which support software might be used to control font substitution in a preservation environment.

  2. Theoretical and Practical Aspects of Logistic Quality Management System Documentation Development Process

    Directory of Open Access Journals (Sweden)

    Linas Šaulinskas

    2013-12-01

    Full Text Available This paper addresses aspects of logistics quality management system documentation development and suggests models for quality management system documentation development, documentation hierarchical systems and authorization approval. It also identifies logistic processes and a responsibilities model and a detailed document development and approval process that can be practically applied. Our results are based upon an analysis of advanced Lithuanian and foreign corporate business practices, a review of current literature and recommendations for quality management system standards.

  3. Documents and legal texts: Australia, Germany, Sweden

    International Nuclear Information System (INIS)

    Anon.

    2012-01-01

    Australia: National Radioactive Waste Management Act 2012 No. 29, 2012 (An Act to make provision in relation to the selection of a site for, and the establishment and operation of, a radioactive waste management facility, and for related purposes). Germany: Act on the Peaceful Utilisation of Atomic Energy and the Protection against its Hazards (Atomic Energy Act) of 23 December 1959, as amended and promulgated on 15 July 1985, last amendment by the Act of 8 November 2011. Sweden: The Swedish Radiation Safety Authority's regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (Swedish Radiation Safety Authority Regulatory Code issued on 20 October 2011, Published on 2 November 2011); The Swedish Radiation Safety Authority's general advice on the application of the regulations concerning clearance of materials, rooms, buildings and land in practices involving ionising radiation (issued on 20 October 2011)

  4. Invisible in Thailand: documenting the need for protection

    Directory of Open Access Journals (Sweden)

    Margaret Green

    2008-04-01

    Full Text Available The International Rescue Committee (IRC has conducted asurvey to document the experiences of Burmese people livingin border areas of Thailand and assess the degree to whichthey merit international protection as refugees.

  5. The Digital Administrative Document: an approximate path

    Directory of Open Access Journals (Sweden)

    Francesca Delneri

    2017-09-01

    Full Text Available If the road towards a progressive dematerialization of the administrative document is marked, the legislator is not always proceeding in a coherent, clear or complete way. The line of reasoning needs to be focused on the administration of documental heritage, training and preservation rather than on technological issues, involving actively local administration and taking on responsibilities on decisions, also on the relationship between costs and benefits. The way is hard due to the lack of debate and practical directions, where the preservation should not be considered as a commanding confirmation, but as an occasion to face complex and critical issues.

  6. Provable Fair Document Exchange Protocol with Transaction Privacy for E-Commerce

    Directory of Open Access Journals (Sweden)

    Ren-Junn Hwang

    2015-04-01

    Full Text Available Transaction privacy has attracted a lot of attention in the e-commerce. This study proposes an efficient and provable fair document exchange protocol with transaction privacy. Using the proposed protocol, any untrusted parties can fairly exchange documents without the assistance of online, trusted third parties. Moreover, a notary only notarizes each document once. The authorized document owner can exchange a notarized document with different parties repeatedly without disclosing the origin of the document or the identities of transaction participants. Security and performance analyses indicate that the proposed protocol not only provides strong fairness, non-repudiation of origin, non-repudiation of receipt, and message confidentiality, but also enhances forward secrecy, transaction privacy, and authorized exchange. The proposed protocol is more efficient than other works.

  7. 文件物件模型及其在XML文件處理之應用 Document Object Model and Its Application on XML Document Processing

    Directory of Open Access Journals (Sweden)

    Sinn-cheng Lin

    2001-06-01

    Full Text Available 無Document Object Model (DOM is an application-programming interface that can be applied to process XML documents. It defines the logical structure, the accessing interfaces and the operation methods for the document. In the DOM, an original document is mapped to a tree structure. Therefore ,the computer program can easily traverse the tree manipulate the nodes in the tree. In this paper, the fundamental models, definitions and specifications of DOM are surveyed. Then we create an experimenta1 system of DOM called XML On-Line Parser. The front-end of the system is built by the Web-based user interface for the XML document input and the parsed result output. On the other hand, the back-end of the system is built by an ASP program, which transforms the original document to DOM tree for document manipulation. This on-line system can be linked with a general-purpose web browser to check the well-formedness and the validity of the XML documents.

  8. The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements; Textes de l'Accord Relatif au Siege Conclu Entre l'Agence et l'Autriche et d'Accords Connexes

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1975-12-16

    The texts of the Agreement between the International Atomic Energy Agency and the Republic of Austria that were in force on 30 September 1975 are reproduced in this document for the information of all Members of the Agency [French] Les textes de l'Accord relatif au siege conclu entre l'Agence et la Republique d'Autriche et de divers accords connexes, qui etaient en vigueur le 30 septembre 1975, sont reproduits dans le present document pour l'information de tous les Membres de l'Agence.

  9. The Medline/full-text research project.

    Science.gov (United States)

    McKinin, E J; Sievert, M; Johnson, E D; Mitchell, J A

    1991-05-01

    This project was designed to test the relative efficacy of index terms and full-text for the retrieval of documents in those MEDLINE journals for which full-text searching was also available. The full-text files used were MEDIS from Mead Data Central and CCML from BRS Information Technologies. One hundred clinical medical topics were searched in these two files as well as the MEDLINE file to accumulate the necessary data. It was found that full-text identified significantly more relevant articles than did the indexed file, MEDLINE. The full-text searches, however, lacked the precision of searches done in the indexed file. Most relevant items missed in the full-text files, but identified in MEDLINE, were missed because the searcher failed to account for some aspect of natural language, used a logical or positional operator that was too restrictive, or included a concept which was implied, but not expressed in the natural language. Very few of the unique relevant full-text citations would have been retrieved by title or abstract alone. Finally, as of July, 1990 the more current issue of a journal was just as likely to appear in MEDLINE as in one of the full-text files.

  10. Computerising documentation

    International Nuclear Information System (INIS)

    Anon.

    1992-01-01

    The nuclear power generation industry is faced with public concern and government pressures over safety, efficiency and risk. Operators throughout the industry are addressing these issues with the aid of a new technology - technical document management systems (TDMS). Used for strategic and tactical advantage, the systems enable users to scan, archive, retrieve, store, edit, distribute worldwide and manage the huge volume of documentation (paper drawings, CAD data and film-based information) generated in building, maintaining and ensuring safety in the UK's power plants. The power generation industry has recognized that the management and modification of operation critical information is vital to the safety and efficiency of its power plants. Regulatory pressure from the Nuclear Installations Inspectorate (NII) to operate within strict safety margins or lose Site Licences has prompted the need for accurate, up-to-data documentation. A document capture and management retrieval system provides a powerful cost-effective solution, giving rapid access to documentation in a tightly controlled environment. The computerisation of documents and plans is discussed in this article. (Author)

  11. Entropy and Graph Based Modelling of Document Coherence using Discourse Entities

    DEFF Research Database (Denmark)

    Petersen, Casper; Lioma, Christina; Simonsen, Jakob Grue

    2015-01-01

    We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse...... entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28......] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse...

  12. Endangered Language Documentation and Transmission

    Directory of Open Access Journals (Sweden)

    D. Victoria Rau

    2007-01-01

    Full Text Available This paper describes an on-going project on digital archiving Yami language documentation (http://www.hrelp.org/grants/projects/index.php?projid=60. We present a cross-disciplinary approach, involving computer science and applied linguistics, to document the Yami language and prepare teaching materials. Our discussion begins with an introduction to an integrated framework for archiving, processing and developing learning materials for Yami (Yang and Rau 2005, followed by a historical account of Yami language teaching, from a grammatical syllabus (Dong and Rau 2000b to a communicative syllabus using a multimedia CD as a resource (Rau et al. 2005, to the development of interactive on-line learning based on the digital archiving project. We discuss the methods used and challenges of each stage of preparing Yami teaching materials, and present a proposal for rethinking pedagogical models for e-learning.

  13. Signal Detection Framework Using Semantic Text Mining Techniques

    Science.gov (United States)

    Sudarsan, Sithu D.

    2009-01-01

    Signal detection is a challenging task for regulatory and intelligence agencies. Subject matter experts in those agencies analyze documents, generally containing narrative text in a time bound manner for signals by identification, evaluation and confirmation, leading to follow-up action e.g., recalling a defective product or public advisory for…

  14. A method for extracting design rationale knowledge based on Text Mining

    Directory of Open Access Journals (Sweden)

    Liu Jihong

    2017-01-01

    Full Text Available Capture design rationale (DR knowledge and presenting it to designers by good form, which have great significance for design reuse and design innovation. Since the 1970s design rationality began to develop, many teams have developed their own design rational system. However, the DR acquisition system is not intelligent enough, and it still requires designers to do a lot of operations. In addition, the existing design documents contain a large number of DR knowledge, but it has not been well excavated. Therefore, a method and system are needed to better extract DR knowledge in design documents. We have proposed a DRKH (design rationale knowledge hierarchy model for DR representation. The DRKH model has three layers, respectively as design intent layer, design decision layer and design basis layer. In this paper, we use text mining method to extract DR from design documents and construct DR model. Finally, the welding robot design specification is taken as an example to demonstrate the system interface.

  15. Methodological Aspects of Architectural Documentation

    Directory of Open Access Journals (Sweden)

    Arivaldo Amorim

    2011-12-01

    Full Text Available This paper discusses the methodological approach that is being developed in the state of Bahia in Brazil since 2003, in architectural and urban sites documentation, using extensive digital technologies. Bahia has a vast territory with important architectural ensembles ranging from the sixteenth century to present day. As part of this heritage is constructed of raw earth and wood, it is very sensitive to various deleterious agents. It is therefore critical document this collection that is under threats. To conduct those activities diverse digital technologies that could be used in documentation process are being experimented. The task is being developed as an academic research, with few financial resources, by scholarship students and some volunteers. Several technologies are tested ranging from the simplest to the more sophisticated ones, used in the main stages of the documentation project, as follows: work overall planning, data acquisition, processing and management and ultimately, to control and evaluate the work. The activities that motivated this paper are being conducted in the cities of Rio de Contas and Lençóis in the Chapada Diamantina, located at 420 km and 750 km from Salvador respectively, in Cachoeira city at Recôncavo Baiano area, 120 km from Salvador, the capital of Bahia state, and at Pelourinho neighbourhood, located in the historic capital. Part of the material produced can be consulted in the website: < www.lcad.ufba.br>.

  16. Project Documentation as a Risk for Public Projects

    Directory of Open Access Journals (Sweden)

    Vladěna Štěpánková

    2015-08-01

    Full Text Available Purpose of the article: The paper presents the different methodologies used for creating documentation and focuses on public projects and their requirements for this documentation. Since documentation is also incorporated in the overall planning of the project and its duration is estimated using expert qualified estimate, can any change in this documentation lead to project delays, or increase its cost as a result of consuming administration, and therefore the documentation is seen as a risk, which may threaten the project as a public contract by which a company trying to achieve and obtains it, and generally any project. Methodology/methods: There are used methods of obtaining information in this paper. These are mainly structured interviews in combination with a brainstorming, furthermore also been used questionnaire for companies dealing with public procurement. As a data processing program was used MS Excel and basic statistical methods based on regression analysis. Scientific aim: The article deals with the construction market in the Czech Republic and examines the impact of changes in project documentation of public projects on their turnover. Findings: In this paper we summarize the advantages and disadvantages of having project documentation. In the case of public contracts and changes in legislation it is necessary to focus on creating documentation in advance, follow the new requirements and try to reach them in the shortest possible time. Conclusions: The paper concludes with recommendations on how to proceed, if these changes and how to reduce costs, which may cause the risk of documentation.

  17. Identification of documented medication non-adherence in physician notes.

    Science.gov (United States)

    Turchin, Alexander; Wheeler, Holly I; Labreche, Matthew; Chu, Julia T; Pendergrass, Merri L; Einbinder, Jonathan S; Einbinder, Jonathan Seth

    2008-11-06

    Medication non-adherence is common and the physicians awareness of it may be an important factor in clinical decision making. Few sources of data on physician awareness of medication non-adherence are available. We have designed an algorithm to identify documentation of medication non-adherence in the text of physician notes. The algorithm recognizes eight semantic classes of documentation of medication non-adherence. We evaluated the algorithm against manual ratings of 200 randomly selected notes of hypertensive patients. The algorithm detected 89% of the notes with documented medication non-adherence with specificity of 84.7% and positive predictive value of 80.2%. In a larger dataset of 1,000 documents, notes that documented medication non-adherence were more likely to report significantly elevated systolic (15.3% vs. 9.0%; p = 0.002) and diastolic (4.1% vs. 1.9%; p = 0.03) blood pressure. This novel clinically validated tool expands the range of information on medication non-adherence available to researchers.

  18. Document Models

    Directory of Open Access Journals (Sweden)

    A.A. Malykh

    2017-08-01

    Full Text Available In this paper, the concept of locally simple models is considered. Locally simple models are arbitrarily complex models built from relatively simple components. A lot of practically important domains of discourse can be described as locally simple models, for example, business models of enterprises and companies. Up to now, research in human reasoning automation has been mainly concentrated around the most intellectually intensive activities, such as automated theorem proving. On the other hand, the retailer business model is formed from ”jobs”, and each ”job” can be modelled and automated more or less easily. At the same time, the whole retailer model as an integrated system is extremely complex. In this paper, we offer a variant of the mathematical definition of a locally simple model. This definition is intended for modelling a wide range of domains. Therefore, we also must take into account the perceptual and psychological issues. Logic is elitist, and if we want to attract to our models as many people as possible, we need to hide this elitism behind some metaphor, to which ’ordinary’ people are accustomed. As such a metaphor, we use the concept of a document, so our locally simple models are called document models. Document models are built in the paradigm of semantic programming. This allows us to achieve another important goal - to make the documentary models executable. Executable models are models that can act as practical information systems in the described domain of discourse. Thus, if our model is executable, then programming becomes redundant. The direct use of a model, instead of its programming coding, brings important advantages, for example, a drastic cost reduction for development and maintenance. Moreover, since the model is well and sound, and not dissolved within programming modules, we can directly apply AI tools, in particular, machine learning. This significantly expands the possibilities for automation and

  19. Clinical map document based on XML (cMDX: document architecture with mapping feature for reporting and analysing prostate cancer in radical prostatectomy specimens

    Directory of Open Access Journals (Sweden)

    Bettendorf Olaf

    2010-11-01

    Full Text Available Abstract Background The pathology report of radical prostatectomy specimens plays an important role in clinical decisions and the prognostic evaluation in Prostate Cancer (PCa. The anatomical schema is a helpful tool to document PCa extension for clinical and research purposes. To achieve electronic documentation and analysis, an appropriate documentation model for anatomical schemas is needed. For this purpose we developed cMDX. Methods The document architecture of cMDX was designed according to Open Packaging Conventions by separating the whole data into template data and patient data. Analogue custom XML elements were considered to harmonize the graphical representation (e.g. tumour extension with the textual data (e.g. histological patterns. The graphical documentation was based on the four-layer visualization model that forms the interaction between different custom XML elements. Sensible personal data were encrypted with a 256-bit cryptographic algorithm to avoid misuse. In order to assess the clinical value, we retrospectively analysed the tumour extension in 255 patients after radical prostatectomy. Results The pathology report with cMDX can represent pathological findings of the prostate in schematic styles. Such reports can be integrated into the hospital information system. "cMDX" documents can be converted into different data formats like text, graphics and PDF. Supplementary tools like cMDX Editor and an analyser tool were implemented. The graphical analysis of 255 prostatectomy specimens showed that PCa were mostly localized in the peripheral zone (Mean: 73% ± 25. 54% of PCa showed a multifocal growth pattern. Conclusions cMDX can be used for routine histopathological reporting of radical prostatectomy specimens and provide data for scientific analysis.

  20. A Study of Readability of Texts in Bangla through Machine Learning Approaches

    Science.gov (United States)

    Sinha, Manjira; Basu, Anupam

    2016-01-01

    In this work, we have investigated text readability in Bangla language. Text readability is an indicator of the suitability of a given document with respect to a target reader group. Therefore, text readability has huge impact on educational content preparation. The advances in the field of natural language processing have enabled the automatic…

  1. Semantic Annotation of Unstructured Documents Using Concepts Similarity

    Directory of Open Access Journals (Sweden)

    Fernando Pech

    2017-01-01

    Full Text Available There is a large amount of information in the form of unstructured documents which pose challenges in the information storage, search, and retrieval. This situation has given rise to several information search approaches. Some proposals take into account the contextual meaning of the terms specified in the query. Semantic annotation technique can help to retrieve and extract information in unstructured documents. We propose a semantic annotation strategy for unstructured documents as part of a semantic search engine. In this proposal, ontologies are used to determine the context of the entities specified in the query. Our strategy for extracting the context is focused on concepts similarity. Each relevant term of the document is associated with an instance in the ontology. The similarity between each of the explicit relationships is measured through the combination of two types of associations: the association between each pair of concepts and the calculation of the weight of the relationships.

  2. Robust binarization of degraded document images using heuristics

    Science.gov (United States)

    Parker, Jon; Frieder, Ophir; Frieder, Gideon

    2013-12-01

    Historically significant documents are often discovered with defects that make them difficult to read and analyze. This fact is particularly troublesome if the defects prevent software from performing an automated analysis. Image enhancement methods are used to remove or minimize document defects, improve software performance, and generally make images more legible. We describe an automated, image enhancement method that is input page independent and requires no training data. The approach applies to color or greyscale images with hand written script, typewritten text, images, and mixtures thereof. We evaluated the image enhancement method against the test images provided by the 2011 Document Image Binarization Contest (DIBCO). Our method outperforms all 2011 DIBCO entrants in terms of average F1 measure - doing so with a significantly lower variance than top contest entrants. The capability of the proposed method is also illustrated using select images from a collection of historic documents stored at Yad Vashem Holocaust Memorial in Israel.

  3. Cat swarm optimization based evolutionary framework for multi document summarization

    Science.gov (United States)

    Rautray, Rasmita; Balabantaray, Rakesh Chandra

    2017-07-01

    Today, World Wide Web has brought us enormous quantity of on-line information. As a result, extracting relevant information from massive data has become a challenging issue. In recent past text summarization is recognized as one of the solution to extract useful information from vast amount documents. Based on number of documents considered for summarization, it is categorized as single document or multi document summarization. Rather than single document, multi document summarization is more challenging for the researchers to find accurate summary from multiple documents. Hence in this study, a novel Cat Swarm Optimization (CSO) based multi document summarizer is proposed to address the problem of multi document summarization. The proposed CSO based model is also compared with two other nature inspired based summarizer such as Harmony Search (HS) based summarizer and Particle Swarm Optimization (PSO) based summarizer. With respect to the benchmark Document Understanding Conference (DUC) datasets, the performance of all algorithms are compared in terms of different evaluation metrics such as ROUGE score, F score, sensitivity, positive predicate value, summary accuracy, inter sentence similarity and readability metric to validate non-redundancy, cohesiveness and readability of the summary respectively. The experimental analysis clearly reveals that the proposed approach outperforms the other summarizers included in the study.

  4. Segmentation-driven compound document coding based on H.264/AVC-INTRA.

    Science.gov (United States)

    Zaghetto, Alexandre; de Queiroz, Ricardo L

    2007-07-01

    In this paper, we explore H.264/AVC operating in intraframe mode to compress a mixed image, i.e., composed of text, graphics, and pictures. Even though mixed contents (compound) documents usually require the use of multiple compressors, we apply a single compressor for both text and pictures. For that, distortion is taken into account differently between text and picture regions. Our approach is to use a segmentation-driven adaptation strategy to change the H.264/AVC quantization parameter on a macroblock by macroblock basis, i.e., we deviate bits from pictorial regions to text in order to keep text edges sharp. We show results of a segmentation driven quantizer adaptation method applied to compress documents. Our reconstructed images have better text sharpness compared to straight unadapted coding, at negligible visual losses on pictorial regions. Our results also highlight the fact that H.264/AVC-INTRA outperforms coders such as JPEG-2000 as a single coder for compound images.

  5. Analysis of Documentation Speed Using Web-Based Medical Speech Recognition Technology: Randomized Controlled Trial.

    Science.gov (United States)

    Vogel, Markus; Kaisers, Wolfgang; Wassmuth, Ralf; Mayatepek, Ertan

    2015-11-03

    Clinical documentation has undergone a change due to the usage of electronic health records. The core element is to capture clinical findings and document therapy electronically. Health care personnel spend a significant portion of their time on the computer. Alternatives to self-typing, such as speech recognition, are currently believed to increase documentation efficiency and quality, as well as satisfaction of health professionals while accomplishing clinical documentation, but few studies in this area have been published to date. This study describes the effects of using a Web-based medical speech recognition system for clinical documentation in a university hospital on (1) documentation speed, (2) document length, and (3) physician satisfaction. Reports of 28 physicians were randomized to be created with (intervention) or without (control) the assistance of a Web-based system of medical automatic speech recognition (ASR) in the German language. The documentation was entered into a browser's text area and the time to complete the documentation including all necessary corrections, correction effort, number of characters, and mood of participant were stored in a database. The underlying time comprised text entering, text correction, and finalization of the documentation event. Participants self-assessed their moods on a scale of 1-3 (1=good, 2=moderate, 3=bad). Statistical analysis was done using permutation tests. The number of clinical reports eligible for further analysis stood at 1455. Out of 1455 reports, 718 (49.35%) were assisted by ASR and 737 (50.65%) were not assisted by ASR. Average documentation speed without ASR was 173 (SD 101) characters per minute, while it was 217 (SD 120) characters per minute using ASR. The overall increase in documentation speed through Web-based ASR assistance was 26% (P=.04). Participants documented an average of 356 (SD 388) characters per report when not assisted by ASR and 649 (SD 561) characters per report when assisted

  6. ON EXPERIENCE OF THE ELECTRONIC DOCUMENT MANAGEMENT SYSTEM IMPLEMENTATION IN THE MEDICAL UNIVERSITY

    Directory of Open Access Journals (Sweden)

    A. V. Semenets

    2015-05-01

    Full Text Available An importance of the application of the electronic document management to the Ukraine healthcare is shown. The electronic document management systems market overview is presented. Example of the usage of the open-source electronic document management system in the Ternopil State Medical University by I. Ya. Horbachevsky is shown. The implementation capabilities of the electronic document management system within a cloud services are shown. The electronic document management features of the Microsoft Office 365 and Google Apps For Education are compared. Some results of the usage of the Google Apps For Education inTSMUas electronic document management system are presented.

  7. Digitization of Full-Text Documents Before Publishing on the Internet: A Case Study Reviewing the Latest Optical Character Recognition Technologies.

    Science.gov (United States)

    McClean, Clare M.

    1998-01-01

    Reviews strengths and weaknesses of five optical character recognition (OCR) software packages used to digitize paper documents before publishing on the Internet. Outlines options available and stages of the conversion process. Describes the learning experience of Eurotext, a United Kingdom-based electronic libraries project (eLib). (PEN)

  8. Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.

    Science.gov (United States)

    Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald

    2001-01-01

    Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…

  9. From Medieval Philosophy to the Virtual Library: a descriptive framework for scientific knowledge and documentation as basis for document retrieval

    Directory of Open Access Journals (Sweden)

    Frances Morrissey

    2001-11-01

    Full Text Available This paper examines the conceptual basis of document retrieval systems for the Virtual Library in science and technology. It does so through analysing some cognitive models for scientific knowledge, drawing on philosophy, sociology and linguistics. It is important to consider improvements in search/ retrieval functionalities for scientific documents because knowledge creation and transfer are integral to the functioning of scientific communities, and on a larger scale, science and technology are central to the knowledge economy. This paper proposes four new and innovative understandings. Firstly, it is proposed that formal scientific communication constitutes the documentation and dissemination of concepts, and that conceptualism is a useful philosophical basis for study. Second, it is proposed that the scientific document is a dyadic con-struct, being both the physical manifestation as an encoded medium, and also being the associated knowledge, or intangible ideation, that is carried within the document. Third, it is shown that major philosophers of science divide science into three main activities, dealing with data, derived or inferred laws, and the axioms or the paradigm. Fourth, it is demonstrated that the data, information and conceptual frameworks carried by a scientific document, as different levels of signification or semiotic systems, can each be characterised in ways assisting in search and retrieval functionalities for the Virtual Library.

  10. Utah Text Retrieval Project

    Energy Technology Data Exchange (ETDEWEB)

    Hollaar, L A

    1983-10-01

    The Utah Text Retrieval project seeks well-engineered solutions to the implementation of large, inexpensive, rapid text information retrieval systems. The project has three major components. Perhaps the best known is the work on the specialized processors, particularly search engines, necessary to achieve the desired performance and cost. The other two concern the user interface to the system and the system's internal structure. The work on user interface development is not only concentrating on the syntax and semantics of the query language, but also on the overall environment the system presents to the user. Environmental enhancements include convenient ways to browse through retrieved documents, access to other information retrieval systems through gateways supporting a common command interface, and interfaces to word processing systems. The system's internal structure is based on a high-level data communications protocol linking the user interface, index processor, search processor, and other system modules. This allows them to be easily distributed in a multi- or specialized-processor configuration. It also allows new modules, such as a knowledge-based query reformulator, to be added. 15 references.

  11. Investigating scientific literacy documents with linguistic network analysis

    DEFF Research Database (Denmark)

    Bruun, Jesper; Evans, Robert Harry; Dolin, Jens

    2009-01-01

    International discussions of scientific literacy (SL) are extensive and numerous sizeable documents on SL exist. Thus, comparing different conceptions of SL is methodologically challenging. We developed an analytical tool which couples the theory of complex networks with text analysis in order...

  12. Documentation Service

    International Nuclear Information System (INIS)

    Charnay, J.; Chosson, L.; Croize, M.; Ducloux, A.; Flores, S.; Jarroux, D.; Melka, J.; Morgue, D.; Mottin, C.

    1998-01-01

    This service assures the treatment and diffusion of the scientific information and the management of the scientific production of the institute as well as the secretariat operation for the groups and services of the institute. The report on documentation-library section mentions: the management of the documentation funds, search in international databases (INIS, Current Contents, Inspects), Pret-Inter service which allows accessing documents through DEMOCRITE network of IN2P3. As realizations also mentioned are: the setup of a video, photo database, the Web home page of the institute's library, follow-up of digitizing the document funds by integrating the CD-ROMs and diskettes, electronic archiving of the scientific production, etc

  13. Document image binarization using "multi-scale" predefined filters

    Science.gov (United States)

    Saabni, Raid M.

    2018-04-01

    Reading text or searching for key words within a historical document is a very challenging task. one of the first steps of the complete task is binarization, where we separate foreground such as text, figures and drawings from the background. Successful results of this important step in many cases can determine next steps to success or failure, therefore it is very vital to the success of the complete task of reading and analyzing the content of a document image. Generally, historical documents images are of poor quality due to their storage condition and degradation over time, which mostly cause to varying contrasts, stains, dirt and seeping ink from reverse side. In this paper, we use banks of anisotropic predefined filters in different scales and orientations to develop a binarization method for degraded documents and manuscripts. Using the fact, that handwritten strokes may follow different scales and orientations, we use predefined sets of filter banks having various scales, weights, and orientations to seek a compact set of filters and weights in order to generate diffrent layers of foregrounds and background. Results of convolving these fiters on the gray level image locally, weighted and accumulated to enhance the original image. Based on the different layers, seeds of components in the gray level image and a learning process, we present an improved binarization algorithm to separate the background from layers of foreground. Different layers of foreground which may be caused by seeping ink, degradation or other factors are also separated from the real foreground in a second phase. Promising experimental results were obtained on the DIBCO2011 , DIBCO2013 and H-DIBCO2016 data sets and a collection of images taken from real historical documents.

  14. Using anchor text, spam filtering and Wikipedia for web search and entity ranking

    NARCIS (Netherlands)

    Kamps, J.; Kaptein, R.; Koolen, M.; Voorhees, E.M.; Buckland, L.P.

    2010-01-01

    In this paper, we document our efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track we wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as

  15. The Archives of Prefectures: from dematerialization to document management

    Directory of Open Access Journals (Sweden)

    Annantonia Martorano

    2017-12-01

    Full Text Available This item analyses the key role that IT management of document workflow plays in the implementation of the administrative functions of the Prefectures. After presenting the history and the organization of the Prefecture, which is the most important decentralized office of the Ministry of the Interior, the dissertation analyses the effectiveness and economy of its administrative actions, now characterized by the wide use of IT tools and aiming at the complete dematerialization of its documents, with the complete abandonment of the paper form. This process, now irreversible, uses the methods of traditional archival discipline and it is strengthened by new technological devises, for a proper processing and storage of the digital or paper documents.

  16. Subject (of documents)

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2017-01-01

    This article presents and discuss the concept “subject” or subject matter (of documents) as it has been examined in library and information science (LIS) for more than 100 years. Different theoretical positions are outlined and it is found that the most important distinction is between document......-oriented views versus request-oriented views. The document-oriented view conceive subject as something inherent in documents, whereas the request-oriented view (or the policy based view) understand subject as an attribution made to documents in order to facilitate certain uses of them. Related concepts...

  17. Organising Documentation in Knowledge Evolution and Communication

    Directory of Open Access Journals (Sweden)

    Cristina De Castro

    2007-06-01

    Full Text Available The knowledge of a subject evolves in time due to many factors, such as better understanding, study of additional issues within the same subject, study of related work from other themes, etc. This can be achieved by individual work, direct cooperation with other people and, in general, knowledge sharing. In this context, and in the broader context of knowledge communication, the appropriate organisation of documentation plays a fundamental role, but is often very difficult to achieve. A layered architecture is here proposed for the development of a structured repository of documentation, here called knowledge-bibliography KB. The process of knowledge acquisition, evolution and communication is firstly considered, then the distributed nature of nowadays knowledge and the ways it is shared and transferred are taken into account. On the basis of the above considerations, a possible clustering of documentation collected by many people is defined. An LDAP-based architecture for the implementation of this structure is also discussed.

  18. Reactive documentation system

    Science.gov (United States)

    Boehnlein, Thomas R.; Kramb, Victoria

    2018-04-01

    Proper formal documentation of computer acquired NDE experimental data generated during research is critical to the longevity and usefulness of the data. Without documentation describing how and why the data was acquired, NDE research teams lose capability such as their ability to generate new information from previously collected data or provide adequate information so that their work can be replicated by others seeking to validate their research. Despite the critical nature of this issue, NDE data is still being generated in research labs without appropriate documentation. By generating documentation in series with data, equal priority is given to both activities during the research process. One way to achieve this is to use a reactive documentation system (RDS). RDS prompts an operator to document the data as it is generated rather than relying on the operator to decide when and what to document. This paper discusses how such a system can be implemented in a dynamic environment made up of in-house and third party NDE data acquisition systems without creating additional burden on the operator. The reactive documentation approach presented here is agnostic enough that the principles can be applied to any operator controlled, computer based, data acquisition system.

  19. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1962-04-10

    The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency.

  20. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1962-01-01

    The text of the relationship agreement which the Agency has concluded with the Inter-Governmental Maritime Consultative Organization, together with the protocol authenticating it, is reproduced in this document for the information of all Members of the Agency

  1. Adaptive removal of background and white space from document images using seam categorization

    Science.gov (United States)

    Fillion, Claude; Fan, Zhigang; Monga, Vishal

    2011-03-01

    Document images are obtained regularly by rasterization of document content and as scans of printed documents. Resizing via background and white space removal is often desired for better consumption of these images, whether on displays or in print. While white space and background are easy to identify in images, existing methods such as naïve removal and content aware resizing (seam carving) each have limitations that can lead to undesirable artifacts, such as uneven spacing between lines of text or poor arrangement of content. An adaptive method based on image content is hence needed. In this paper we propose an adaptive method to intelligently remove white space and background content from document images. Document images are different from pictorial images in structure. They typically contain objects (text letters, pictures and graphics) separated by uniform background, which include both white paper space and other uniform color background. Pixels in uniform background regions are excellent candidates for deletion if resizing is required, as they introduce less change in document content and style, compared with deletion of object pixels. We propose a background deletion method that exploits both local and global context. The method aims to retain the document structural information and image quality.

  2. Application of Laser Scanning for Creating Geological Documentation

    Directory of Open Access Journals (Sweden)

    Buczek Michał

    2018-01-01

    Full Text Available A geological documentation is based on the analyses obtained from boreholes, geological exposures, and geophysical methods. It consists of text and graphic documents, containing drilling sections, vertical crosssections through the deposit and various types of maps. The surveying methods (such as LIDAR can be applied in measurements of exposed rock layers, presented in appendices to the geological documentation. The laser scanning allows obtaining a complete profile of exposed surfaces in a short time and with a millimeter accuracy. The possibility of verifying the existing geological cross-section with laser scanning was tested on the example of the AGH experimental mine. The test field is built of different lithological rocks. Scans were taken from a single station, under favorable measuring conditions. The analysis of the signal intensity allowed to divide point cloud into separate geological layers. The results were compared with the geological profiles of the measured object. The same approach was applied to the data from the Vietnamese hard coal open pit mine Coc Sau. The thickness of exposed coal bed deposits and gangue layers were determined from the obtained data (point cloud in combination with the photographs. The results were compared with the geological cross-section.

  3. Quantification of competitive value of documents

    Directory of Open Access Journals (Sweden)

    Pavel Šimek

    2009-01-01

    Full Text Available The majority of Internet users use the global network to search for different information using fulltext search engines such as Google, Yahoo!, or Seznam. The web presentation operators are trying, with the help of different optimization techniques, to get to the top places in the results of fulltext search engines. Right there is a great importance of Search Engine Optimization and Search Engine Marketing, because normal users usually try links only on the first few pages of the fulltext search engines results on certain keywords and in catalogs they use primarily hierarchically higher placed links in each category. Key to success is the application of optimization methods which deal with the issue of keywords, structure and quality of content, domain names, individual sites and quantity and reliability of backward links. The process is demanding, long-lasting and without a guaranteed outcome. A website operator without advanced analytical tools do not identify the contribution of individual documents from which the entire web site consists. If the web presentation operators want to have an overview of their documents and web site in global, it is appropriate to quantify these positions in a specific way, depending on specific key words. For this purpose serves the quantification of competitive value of documents, which consequently sets global competitive value of a web site. Quantification of competitive values is performed on a specific full-text search engine. For each full-text search engine can be and often are, different results. According to published reports of ClickZ agency or Market Share is according to the number of searches by English-speaking users most widely used Google search engine, which has a market share of more than 80%. The whole procedure of quantification of competitive values is common, however, the initial step which is the analysis of keywords depends on a choice of the fulltext search engine.

  4. A Full-Text-Based Search Engine for Finding Highly Matched Documents Across Multiple Categories

    Science.gov (United States)

    Nguyen, Hung D.; Steele, Gynelle C.

    2016-01-01

    This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.

  5. Text-mining analysis of mHealth research

    Science.gov (United States)

    Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from “mobile phone” to “smartphone” and from “applications” to “apps”. Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical

  6. Text-mining analysis of mHealth research.

    Science.gov (United States)

    Ozaydin, Bunyamin; Zengul, Ferhat; Oner, Nurettin; Delen, Dursun

    2017-01-01

    In recent years, because of the advancements in communication and networking technologies, mobile technologies have been developing at an unprecedented rate. mHealth, the use of mobile technologies in medicine, and the related research has also surged parallel to these technological advancements. Although there have been several attempts to review mHealth research through manual processes such as systematic reviews, the sheer magnitude of the number of studies published in recent years makes this task very challenging. The most recent developments in machine learning and text mining offer some potential solutions to address this challenge by allowing analyses of large volumes of texts through semi-automated processes. The objective of this study is to analyze the evolution of mHealth research by utilizing text-mining and natural language processing (NLP) analyses. The study sample included abstracts of 5,644 mHealth research articles, which were gathered from five academic search engines by using search terms such as mobile health, and mHealth. The analysis used the Text Explorer module of JMP Pro 13 and an iterative semi-automated process involving tokenizing, phrasing, and terming. After developing the document term matrix (DTM) analyses such as single value decomposition (SVD), topic, and hierarchical document clustering were performed, along with the topic-informed document clustering approach. The results were presented in the form of word-clouds and trend analyses. There were several major findings regarding research clusters and trends. First, our results confirmed time-dependent nature of terminology use in mHealth research. For example, in earlier versus recent years the use of terminology changed from "mobile phone" to "smartphone" and from "applications" to "apps". Second, ten clusters for mHealth research were identified including (I) Clinical Research on Lifestyle Management, (II) Community Health, (III) Literature Review, (IV) Medical Interventions

  7. Text Categorization on Hadith Sahih Al-Bukhari using Random Forest

    Science.gov (United States)

    Fauzan Afianto, Muhammad; Adiwijaya; Al-Faraby, Said

    2018-03-01

    Al-Hadith is a collection of words, deeds, provisions, and approvals of Rasulullah Shallallahu Alaihi wa Salam that becomes the second fundamental laws of Islam after Al-Qur’an. As a fundamental of Islam, Muslims must learn, memorize, and practice Al-Qur’an and Al-Hadith. One of venerable Imam which was also the narrator of Al-Hadith is Imam Bukhari. He spent over 16 years to compile about 2602 Hadith (without repetition) and over 7000 Hadith with repetition. Automatic text categorization is a task of developing software tools that able to classify text of hypertext document under pre-defined categories or subject code[1]. The algorithm that would be used is Random Forest, which is a development from Decision Tree. In this final project research, the author decided to make a system that able to categorize text document that contains Hadith that narrated by Imam Bukhari under several categories such as suggestion, prohibition, and information. As for the evaluation method, K-fold cross validation with F1-Score will be used and the result is 90%.

  8. Starlink Document Styles

    Science.gov (United States)

    Lawden, M. D.

    This document describes the various styles which are recommended for Starlink documents. It also explains how to use the templates which are provided by Starlink to help authors create documents in a standard style. This paper is concerned mainly with conveying the ``look and feel" of the various styles of Starlink document rather than describing the technical details of how to produce them. Other Starlink papers give recommendations for the detailed aspects of document production, design, layout, and typography. The only style that is likely to be used by most Starlink authors is the Standard style.

  9. Subject Retrieval from Full-Text Databases in the Humanities

    Science.gov (United States)

    East, John W.

    2007-01-01

    This paper examines the problems involved in subject retrieval from full-text databases of secondary materials in the humanities. Ten such databases were studied and their search functionality evaluated, focusing on factors such as Boolean operators, document surrogates, limiting by subject area, proximity operators, phrase searching, wildcards,…

  10. Upon e-Documents: Evolution or Involution on GED Market

    Directory of Open Access Journals (Sweden)

    Mircea GEORGESCU

    2006-01-01

    Full Text Available In many organizations, vital information is trapped within individual desktops and fragmented in server silos across the enterprise. Manual, ad-hoc processes create inefficiencies, confusion and delays as employees waste time searching for important information. Organizations must find safer and easier ways to access, manage and share their content. Document and file management solutions designed to be used across the organization can help achieve these goals and reduce the total cost of managing content throughout the organization. If you decide to use an EDMS, your selection requires a careful, considered balance between your legal requirements and your technological options. The decision to use an EDMS requires significant planning and analysis. Managing documents more effectively, controlling costs associated with documents and document processes, and using resources more efficiently has become and will continue to be increasingly important to businesses and IT organizations.

  11. Document understanding for a broad class of documents

    NARCIS (Netherlands)

    Aiello, Marco; Monz, Christof; Todoran, Leon; Worring, Marcel

    2002-01-01

    We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these

  12. Is there still an unknown Freud? A note on the publications of Freud's texts and on unpublished documents.

    Science.gov (United States)

    Falzeder, Ernst

    2007-01-01

    This article presents an overview of the existing editions of what Freud wrote (works, letters, manuscripts and drafts, diaries and calendar notes, dedications and margin notes in books, case notes, and patient calendars) and what he is recorded as having said (minutes of meetings, interviews, memoirs of and interviews with patients, family members, and followers, and other quotes). There follows a short overview of biographies of Freud and other documentation on his life. It is concluded that a wealth of material is now available to Freud scholars, although more often than not this information is used in a biased and partisan way.

  13. Perspectivas de desarrollo para el documentalismo, el documental en soporte digital

    Directory of Open Access Journals (Sweden)

    Lic. Manuela Penafria

    1999-01-01

    Full Text Available El documental tiene una historia reciente. Al contrario de lo que generalmente se afirma, entendemos que el documental no nace al mismo tiempo que el cine. Las primeras experiencias con las imágenes en movimiento tenían por objeto tan sólo registrar acontecimientos de la vida cotidiana de las personas y de los animales. Así, la contribución de los pioneros del cine para el documental fue mostrar que el material base de trabajo para el documental son las imágenes recogidas en los lugares donde ocurren los acontecimientos. O dicho de otra forma, es el registro in loco que encontramos en el inicio del cine que constituye la raíz (principio base en que se asienta la producción documental.

  14. Documents hipertextuals per a entorns virtuals d'aprenentatge

    Directory of Open Access Journals (Sweden)

    Cristòfol Rovira

    1999-11-01

    Full Text Available L'article mostra les noves oportunitats que la Web d'Internet ha generat en el camp de la creació de documents hipertextuals. A partir de la grandària dels nodes, s'analitzen les característiques essencials del hipertextos d'abans de l'aparició de la Web per comparar-les amb les pàgines d'Internet. També es comenten les avantatges educatives que poden tenir aquest tipus de documents per entorns virtuals d'aprenentatge i finalment es presenta una proposta per escriure hipertextos basada en la grandària dels nodes.

  15. What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.

    Science.gov (United States)

    Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W

    2015-06-01

    Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.

  16. A Survey in Indexing and Searching XML Documents.

    Science.gov (United States)

    Luk, Robert W. P.; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James

    2002-01-01

    Discussion of XML focuses on indexing techniques for XML documents, grouping them into flat-file, semistructured, and structured indexing paradigms. Highlights include searching techniques, including full text search and multistage search; search result presentations; database and information retrieval system integration; XML query languages; and…

  17. Performance evaluation methodology for historical document image binarization.

    Science.gov (United States)

    Ntirogiannis, Konstantinos; Gatos, Basilis; Pratikakis, Ioannis

    2013-02-01

    Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behavior, as well as verifying its effectiveness, by providing qualitative and quantitative indication of its performance. This paper addresses a pixel-based binarization evaluation methodology for historical handwritten/machine-printed document images. In the proposed evaluation scheme, the recall and precision evaluation measures are properly modified using a weighting scheme that diminishes any potential evaluation bias. Additional performance metrics of the proposed evaluation scheme consist of the percentage rates of broken and missed text, false alarms, background noise, character enlargement, and merging. Several experiments conducted in comparison with other pixel-based evaluation measures demonstrate the validity of the proposed evaluation scheme.

  18. PCS a code system for generating production cross section libraries

    International Nuclear Information System (INIS)

    Cox, L.J.

    1997-01-01

    This document outlines the use of the PCS Code System. It summarizes the execution process for generating FORMAT2000 production cross section files from FORMAT2000 reaction cross section files. It also describes the process of assembling the ASCII versions of the high energy production files made from ENDL and Mark Chadwick's calculations. Descriptions of the function of each code along with its input and output and use are given. This document is under construction. Please submit entries, suggestions, questions, and corrections to (ljc at sign llnl.gov) 3 tabs

  19. Criteria Document for B-plant's Surveillance and Maintenance Phase Safety Basis Document

    International Nuclear Information System (INIS)

    SCHWEHR, B.A.

    1999-01-01

    This document is required by the Project Hanford Managing Contractor (PHMC) procedure, HNF-PRO-705, Safety Basis Planning, Documentation, Review, and Approval. This document specifies the criteria that shall be in the B Plant surveillance and maintenance phase safety basis in order to obtain approval of the DOE-RL. This CD describes the criteria to be addressed in the S and M Phase safety basis for the deactivated Waste Fractionization Facility (B Plant) on the Hanford Site in Washington state. This criteria document describes: the document type and format that will be used for the S and M Phase safety basis, the requirements documents that will be invoked for the document development, the deactivated condition of the B Plant facility, and the scope of issues to be addressed in the S and M Phase safety basis document

  20. Tagline: Information Extraction for Semi-Structured Text Elements in Medical Progress Notes

    Science.gov (United States)

    Finch, Dezon Kile

    2012-01-01

    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in…

  1. Quality control of the documentation process in electronic economic activities

    Directory of Open Access Journals (Sweden)

    Krutova A.S.

    2017-06-01

    Full Text Available It is proved that the main tool that will provide adequate information resources e economic activities of social and economic relations are documenting quality control processes as the basis of global information space. Directions problems as formation evaluation information resources in the process of documentation, namely development tools assess the efficiency of the system components – qualitative assessment; development of mathematical modeling tools – quantitative evaluation. A qualitative assessment of electronic documentation of economic activity through exercise performance, efficiency of communication; document management efficiency; effectiveness of flow control operations; relationship management effectiveness. The concept of quality control process documents electronically economic activity to components which include: the level of workflow; forms adequacy of information; consumer quality documents; quality attributes; type of income data; condition monitoring systems; organizational level process documentation; attributes of quality, performance quality consumer; type of management system; type of income data; condition monitoring systems. Grounded components of the control system electronic document subjects of economic activity. Detected components IT-audit management system economic activity: compliance audit; audit of internal control; detailed multilevel analysis; corporate risk assessment methodology. The stages and methods of processing electronic transactions economic activity during condition monitoring of electronic economic activity.

  2. DOCUMENTING LIVING MONUMENTS IN INDONESIA: METHODOLOGY FOR SUSTAINABLE UTILITY

    Directory of Open Access Journals (Sweden)

    F. Suryaningsih

    2013-07-01

    Full Text Available The systematic documentation of cultural heritage in Indonesia has been developed after the establishment of Bataviaasch Genootschap van Kunsten en Wetenschappen (1778 and De Oudheidkundige Dienst (1913 by the Netherlands Indies government. After Indonesian independent, the tasks of cultural heritage documentation take over by The Ministry of Culture (now become The Ministry of Education of Culture with focus on the ancient and classical heritage, so called dead monument. The needed of comprehensive documentation data regarding cultural heritage become significant issues since the government and private sector pay attention to the preservation of heritage building in the urban site, so called living monument. The archives of original drawing plan many times do not fit with the existing condition, while the conservation plan demands a document such as built drawing plan to work on. The technology, methodology and system to provide such comprehensive document of heritage building and site become important, to produce good conservation plan and heritage building regular maintenance. It means the products will have a sustainable and various utility values. Since 1994, Documentation Centre for Architecture – Indonesia (PDA, has established to meet the needs of a comprehensive data of heritage building (living monuments, to utilized as basic document for conservation planning. Not only provide document of the digital drawing such site plan, plan, elevation, section and details of architecture elements, but also document of historic research, material analysis and completed with diagnosis and mapping of building damages. This manuscript is about PDA field experience, working in this subject issue

  3. A Database of Herbaceous Vegetation Responses to Elevated Atmospheric CO{sub 2}

    Energy Technology Data Exchange (ETDEWEB)

    Jones, M.H.

    1999-11-24

    To perform a statistically rigorous meta-analysis of research results on the response by herbaceous vegetation to increased atmospheric CO{sub 2} levels, a multiparameter database of responses was compiled from the published literature. Seventy-eight independent CO{sub 2}-enrichment studies, covering 53 species and 26 response parameters, reported mean response, sample size, and variance of the response (either as standard deviation or standard error). An additional 43 studies, covering 25 species and 6 response parameters, did not report variances. This numeric data package accompanies the Carbon Dioxide Information Analysis Center's (CDIAC's) NDP-072, which provides similar information for woody vegetation. This numeric data package contains a 30-field data set of CO{sub 2}-exposure experiment responses by herbaceous plants (as both a flat ASCII file and a spreadsheet file), files listing the references to the CO{sub 2}-exposure experiments and specific comments relevant to the data in the data sets, and this documentation file (which includes SAS{reg_sign} and Fortran codes to read the ASCII data file). The data files and this documentation are available without charge on a variety of media and via the Internet from CDIAC.

  4. PROJECT ENGINEERING DATA MANAGEMENT AT AUTOMATED PREPARATION OF DESIGN DOCUMENTATION

    Directory of Open Access Journals (Sweden)

    A. V. Guryanov

    2017-01-01

    Full Text Available We have developed and realized instrumental means for automated support of end-to-end design process for design documentation on a product at the programming level. The proposed decision is based on processing of the engineering project data that are contained in interdependent design documents: tactical technical characteristics of products, data on the valuable metals contained in them, the list of components applied in a product and others. Processing of engineering data is based on their conversion to the form provided by requirements of industry standards for design documentation preparation. The general graph of the design documentation developed on a product is provided. The description of the developed software product is given. Automated preparation process of interdependent design documents is shown on the example of preparation of purchased products list. Results of work can be used in case of research and development activities on creation of perspective samples of ADP equipment.

  5. INFORMATION DOCUMENTS – PRIMORDIAL INSTRUMENTS IN TOURIST COMMUNICATION

    Directory of Open Access Journals (Sweden)

    Denisa PARPANDEL

    2010-01-01

    Full Text Available Tourist information proved to have an important influence on the choice of holiday destinations. An important category of promotional means used tourism as a source of information is a tourist information documents in which graphical advertising has a great importance. In a harmonious combination between an informative text and a picture suggestive of its different forms (flyers, brochures, catalogs, guides and tourist maps, posters and billboards, advertisements in the press visualize products of interest. This article highlights the importance of tourism information documents on the selection of destination, the requirements and recommendations for their design and the need arrangement advertisement to increase its impact on potential tourists. Tour operators in cooperation with advertising agency, choosing one means of communication and advertising medium itself, according to market research conducted, the production capacity or area of interest to prepare an advertising campaign, the level of tariffs and the type of benefits offered, the type of tourism product offered and the target market segment targeted.

  6. Factors that affect the accuracy of text-based language identification

    CSIR Research Space (South Africa)

    Botha, GR

    2007-11-01

    Full Text Available its excellent accuracy, another significant ad- vantage of the NB classifier is that new language doc- uments can simply be merged into an existing classifier by adding the n-gram statistics of these documents to the current language model...

  7. Generic safety documentation model

    International Nuclear Information System (INIS)

    Mahn, J.A.

    1994-04-01

    This document is intended to be a resource for preparers of safety documentation for Sandia National Laboratories, New Mexico facilities. It provides standardized discussions of some topics that are generic to most, if not all, Sandia/NM facilities safety documents. The material provides a ''core'' upon which to develop facility-specific safety documentation. The use of the information in this document will reduce the cost of safety document preparation and improve consistency of information

  8. The number of scholarly documents on the public web.

    Directory of Open Access Journals (Sweden)

    Madian Khabsa

    Full Text Available The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24% are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.

  9. WIPP documentation plan

    International Nuclear Information System (INIS)

    Plung, D.L.; Montgomery, T.T.; Glasstetter, S.R.

    1986-01-01

    In support of the programs at the Waste Isolation Pilot Plant (WIPP), the Publications and Procedures Section developed a documentation plan that provides an integrated document hierarchy; further, this plan affords several unique features: 1) the format for procedures minimizes the writing responsibilities of the technical staff and maximizes use of the writing and editing staff; 2) review cycles have been structured to expedite the processing of documents; and 3) the numbers of documents needed to support the program have been appreciably reduced

  10. Observation of Arctic island barren-ground caribou (Rangifer tarandus groenlandicus migratory movement delay due to human induced sea-ice breaking

    Directory of Open Access Journals (Sweden)

    Mathieu Dumond

    2013-06-01

    Full Text Available Normal 0 21 false false false SV X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Normal tabell"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-fareast-language:EN-US;} The seasonal migration of the Dolphin and Union caribou (Rangifer tarandus groenlandicus herd between Victoria Island and the mainland (Nunavut/Northwest Territories, Canada relies on the formation of sea-ice that connects the Island to the mainland from late-October to early-June.  During an aerial survey of the Dolphin and Union caribou herd in October 2007 on southern Victoria Island, Nunavut, Canada, we documented the short-term effects of the artificial maintenance of an open water channel in the sea-ice on caribou migratory movements during staging along the coast.

  11. 2002 reference document; Document de reference 2002

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2002-07-01

    This 2002 reference document of the group Areva, provides information on the society. Organized in seven chapters, it presents the persons responsible for the reference document and for auditing the financial statements, information pertaining to the transaction, general information on the company and share capital, information on company operation, changes and future prospects, assets, financial position, financial performance, information on company management and executive board and supervisory board, recent developments and future prospects. (A.L.B.)

  12. Document representations for classification of short web-page descriptions

    Directory of Open Access Journals (Sweden)

    Radovanović Miloš

    2008-01-01

    Full Text Available Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .

  13. Electronic Document Management Systems: Where Are They Today?

    Science.gov (United States)

    Koulopoulos, Thomas M.; Frappaolo, Carl

    1993-01-01

    Discusses developments in document management systems based on a survey of over 400 corporations and government agencies. Text retrieval and imaging markets, architecture and integration, purchasing plans, and vendor market leaders are covered. Five graphs present data on user preferences for improvements. A sidebar article reviews the development…

  14. A Similarity-Based Approach for Audiovisual Document Classification Using Temporal Relation Analysis

    Directory of Open Access Journals (Sweden)

    Ferrane Isabelle

    2011-01-01

    Full Text Available Abstract We propose a novel approach for video classification that bases on the analysis of the temporal relationships between the basic events in audiovisual documents. Starting from basic segmentation results, we define a new representation method that is called Temporal Relation Matrix (TRM. Each document is then described by a set of TRMs, the analysis of which makes events of a higher level stand out. This representation has been first designed to analyze any audiovisual document in order to find events that may well characterize its content and its structure. The aim of this work is to use this representation to compute a similarity measure between two documents. Approaches for audiovisual documents classification are presented and discussed. Experimentations are done on a set of 242 video documents and the results show the efficiency of our proposals.

  15. Use of Solr and Xapian in the Invenio document repository software

    CERN Document Server

    Glauner, Patrick; Le Meur, Jean-Yves; Simko, Tibor

    2013-01-01

    Invenio is a free comprehensive web-based document repository and digital library software suite originally developed at CERN. It can serve a variety of use cases from an institutional repository or digital library to a web journal. In order to fully use full-text documents for efficient search and ranking, Solr was integrated into Invenio through a generic bridge. Solr indexes extracted full-texts and most relevant metadata. Consequently, Invenio takes advantage of Solr’s efficient search and word similarity ranking capabilities. In this paper, we first give an overview of Invenio, its capabilities and features. We then present our open source Solr integration as well as scalability challenges that arose for an Invenio- based multi-million record repository: the CERN Document Server. We also compare our Solr adapter to an alternative Xapian adapter using the same generic bridge. Both integrations are distributed with the Invenio package and ready to be used by the institutions using or adopting Invenio.

  16. Documenting Employee Conduct

    Science.gov (United States)

    Dalton, Jason

    2009-01-01

    One of the best ways for a child care program to lose an employment-related lawsuit is failure to document the performance of its employees. Documentation of an employee's performance can provide evidence of an employment-related decision such as discipline, promotion, or discharge. When properly implemented, documentation of employee performance…

  17. Non-Local Sparse Image Inpainting for Document Bleed-Through Removal

    Directory of Open Access Journals (Sweden)

    Muhammad Hanif

    2018-05-01

    Full Text Available Bleed-through is a frequent, pervasive degradation in ancient manuscripts, which is caused by ink seeped from the opposite side of the sheet. Bleed-through, appearing as an extra interfering text, hinders document readability and makes it difficult to decipher the information contents. Digital image restoration techniques have been successfully employed to remove or significantly reduce this distortion. This paper proposes a two-step restoration method for documents affected by bleed-through, exploiting information from the recto and verso images. First, the bleed-through pixels are identified, based on a non-stationary, linear model of the two texts overlapped in the recto-verso pair. In the second step, a dictionary learning-based sparse image inpainting technique, with non-local patch grouping, is used to reconstruct the bleed-through-contaminated image information. An overcomplete sparse dictionary is learned from the bleed-through-free image patches, which is then used to estimate a befitting fill-in for the identified bleed-through pixels. The non-local patch similarity is employed in the sparse reconstruction of each patch, to enforce the local similarity. Thanks to the intrinsic image sparsity and non-local patch similarity, the natural texture of the background is well reproduced in the bleed-through areas, and even a possible overestimation of the bleed through pixels is effectively corrected, so that the original appearance of the document is preserved. We evaluate the performance of the proposed method on the images of a popular database of ancient documents, and the results validate the performance of the proposed method compared to the state of the art.

  18. Health physics documentation

    International Nuclear Information System (INIS)

    Stablein, G.

    1980-01-01

    When dealing with radioactive material the health physicist receives innumerable papers and documents within the fields of researching, prosecuting, organizing and justifying radiation protection. Some of these papers are requested by the health physicist and some are required by law. The scope, quantity and deposit periods of the health physics documentation at the Karlsruhe Nuclear Research Center are presented and rationalizing methods discussed. The aim of this documentation should be the application of physics to accident prevention, i.e. documentation should protect those concerned and not the health physicist. (H.K.)

  19. On the use of the singular value decomposition for text retrieval

    Energy Technology Data Exchange (ETDEWEB)

    Husbands, P.; Simon, H.D.; Ding, C.

    2000-12-04

    The use of the Singular Value Decomposition (SVD) has been proposed for text retrieval in several recent works. This technique uses the SVD to project very high dimensional document and query vectors into a low dimensional space. In this new space it is hoped that the underlying structure of the collection is revealed thus enhancing retrieval performance. Theoretical results have provided some evidence for this claim and to some extent experiments have confirmed this. However, these studies have mostly used small test collections and simplified document models. In this work we investigate the use of the SVD on large document collections. We show that, if interpreted as a mechanism for representing the terms of the collection, this technique alone is insufficient for dealing with the variability in term occurrence. Section 2 introduces the text retrieval concepts necessary for our work. A short description of our experimental architecture is presented in Section 3. Section 4 describes how term occurrence variability affects the SVD and then shows how the decomposition influences retrieval performance. A possible way of improving SVD-based techniques is presented in Section 5 and concluded in Section 6.

  20. RELIABLE COGNITIVE DIMENSIONAL DOCUMENT RANKING BY WEIGHTED STANDARD CAUCHY DISTRIBUTION

    Directory of Open Access Journals (Sweden)

    S Florence Vijila

    2017-04-01

    Full Text Available Categorization of cognitively uniform and consistent documents such as University question papers are in demand by e-learners. Literature indicates that Standard Cauchy distribution and the derived values are extensively used for checking uniformity and consistency of documents. The paper attempts to apply this technique for categorizing question papers according to four selective cognitive dimensions. For this purpose cognitive dimensional keyword sets of these four categories (also termed as portrayal concepts are assumed and an automatic procedure is developed to quantify these dimensions in question papers. The categorization is relatively accurate when checked with manual methods. Hence simple and well established term frequency / inverse document frequency ‘tf/ IDF’ technique is considered for automating the categorization process. After the documents categorization, standard Cauchy formula is applied to rank order the documents that have the least differences among Cauchy value, (according to Cauchy theorem so as obtain consistent and uniform documents in an order or ranked. For the purpose of experiments and social survey, seven question papers (documents have been designed with various consistencies. To validate this proposed technique social survey is administered on selective samples of e-learners of Tamil Nadu, India. Results are encouraging and conclusions drawn out of the experiments will be useful to researchers of concept mining and categorizing documents according to concepts. Findings have also contributed utility value to e-learning system designers.

  1. Device of Definition of Hand-Written Documents Belonging to One Executor

    Directory of Open Access Journals (Sweden)

    S. D. Kulik

    2012-03-01

    Full Text Available Results of working out of the device of definition of hand-written documents belonging to the executor of the text in Russian are presented. The device is intended for automation of work of experts and allows to solve problems of information security and search of criminals.

  2. Pesquisa documental: pistas teóricas e metodológicas

    Directory of Open Access Journals (Sweden)

    Jackson Ronie Sá-Silva

    2015-05-01

    Full Text Available O objetivo deste artigo é apresentar alguns apontamentos teóricos e metodológicos sobre a pesquisa documental. Ao fazermos essa exposição pública, por meio de ensaio bibliográfico, queremos provocar o debate sobre a utilização desse procedimento no cotidiano das pesquisas de estudantes, professores e pesquisadores. Primeiramente, conceituamos a pesquisa documental, apresentando as similaridades e diferenças entre esta e a pesquisa bibliográfica, para, em seguida, discutirmos o conceito de documento. Na seqüência, abordamos os critérios metodológicos de pré-análise do documento escrito e, por fim, apresentamos as etapas da análise documental.

  3. Document Categorization with Modified Statistical Language Models for Agglutinative Languages

    Directory of Open Access Journals (Sweden)

    Tantug

    2010-11-01

    Full Text Available In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serious data sparseness problems. In order to cope with this drawback, previous studies in various application areas suggest modified language models based on different morphological units. It is reported that performance improvements can be achieved with these modified language models. In our document categorization experiments, we use standard word form based language models as well as other modified language models based on root words, root words and part-of-speech information, truncated word forms and character sequences. Additionally, to find an optimum parameter set, multiple tests are carried out with different language model orders and smoothing methods. Similar to previous studies on other tasks, our experimental results on categorization of Turkish documents reveal that applying linguistic preprocessing steps for language modeling provides improvements over standard language models to some extent. However, it is also observed that similar level of performance improvements can also be acquired by simpler character level or truncated word form models which are language independent.

  4. Improving imbalanced scientific text classification using sampling strategies and dictionaries

    Directory of Open Access Journals (Sweden)

    Borrajo L.

    2011-12-01

    Full Text Available Many real applications have the imbalanced class distribution problem, where one of the classes is represented by a very small number of cases compared to the other classes. One of the systems affected are those related to the recovery and classification of scientific documentation.

  5. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1960-01-01

    The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency

  6. Ahmad's NPRT system: A practical innovation for documenting male pattern baldness

    Directory of Open Access Journals (Sweden)

    Muhammad Ahmad

    2016-01-01

    Full Text Available Various classifications for male pattern baldness are mentioned in the literature. The 'Norwood's classification is the most commonly used but it has certain limitations. The new system has included 'three' extra features which were not mentioned in any other classification. It provides an opportunity to document the full and correct picture while documenting male pattern baldness. It also aids in assessing the treatment for various degrees of baldness.

  7. Confirm Content Validity and Sender Authenticity for Text Messages by Using QR Code

    Directory of Open Access Journals (Sweden)

    Firas Mohammed Aswad

    2018-05-01

    Full Text Available In light of the information revolution taking place in the modern world, therefore it becomes necessary and important to save this electronic messages. So we offered this technique to ensure the safety of the content of the messages and authenticity of the sender through  networks communication by converting the message's symbols to numbers , each one of this symbols (letters, numbers, symbols will converted into three digits, the first digit represents the ASCII code of the symbol , the second digit represents the frequency of this symbol in the message (the number of times this symbol is appear in the message, and the third digit represents the total number of the locations of the symbol (calculates the symbol location from the first symbol in the message to this symbol itself and blanks also calculated too .The digital signature of the sender will converted to numbers like the symbols of message we explained it before, and this numbers of the digital signature will gathering together to produce three numbers only, this number will gathering with each numbers of the message's symbols, the final  numbers will converted to QR Code , the QR Code will placed with the message and sent to the recipient. The recipient returns the steps of the sender (produce QR Code from the received message and compared it the received QR Codes, if it is match or not. The recipient will ensure that the content is secure, and confirms the authenticity of the sender.

  8. Incorporating other texts: Intertextuality in Malaysian CSR reports

    Directory of Open Access Journals (Sweden)

    Kumaran Rajandran

    2016-11-01

    Full Text Available In Malaysia, corporate social responsibility (CSR is relatively new but corporations have been required to engage in and disclose their CSR. A typical genre for disclosure is CSR reports and these reports often refer to other texts. The article investigates the act of referencing to other texts or intertextuality in Malaysian CSR reports. It creates an archive of CEO Statements and Environment Sections in CSR reports and studies the archive for keywords, which can identify the incorporated texts. The function of these texts is examined in relation to Malaysia’s corporate context. CSR reports contain explicit references to documents (policies, regulations, reports, research, standards and to individuals/groups (CEOs, stakeholders, expert organizations. The incorporated texts display variation in corporate control, which organizes these texts along an intertextual cline. The cline helps to identify corporate and non-corporate sources among the texts. The selection of incorporated texts may reflect government and stock exchange demands. The texts are not standardized and are relevant for the CSR domain and corporations, where these texts monitor and justify CSR performance. Yet, the incorporated texts may perpetuate inexact reporting because corporations select the texts and the parts of texts to refer to. Since these texts have been employed to scrutinize initiatives and results, CSR reports can claim to represent the “truth” about a corporation’s CSR. Hence, intertextuality serves corporate interests.

  9. Methods for Mining and Summarizing Text Conversations

    CERN Document Server

    Carenini, Giuseppe; Murray, Gabriel

    2011-01-01

    Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods

  10. Methodological Demonstration of a Text Analytics Approach to Country Logistics System Assessments

    DEFF Research Database (Denmark)

    Kinra, Aseem; Mukkamala, Raghava Rao; Vatrapu, Ravi

    2017-01-01

    The purpose of this study is to develop and demonstrate a semi-automated text analytics approach for the identification and categorization of information that can be used for country logistics assessments. In this paper, we develop the methodology on a set of documents for 21 countries using...... and the text analyst. Implications are discussed and future work is outlined....

  11. A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

    Science.gov (United States)

    Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia

    2013-01-01

    Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008

  12. Beyond a Box of Documents: The Collaborative Partnership Behind the Oregon Chinese Disinterment Documents Collection

    Directory of Open Access Journals (Sweden)

    Natalia M. Fernández

    2013-06-01

    Full Text Available This article is a case study of a collaboration between the Oregon Multicultural Archives of Oregon State University, Portland State University Library's Special Collections, the Chinese Consolidated Benevolent Association (CCBA, and the Northwest News Network to preserve and make accessible a recovered box of Oregon Chinese disinterment documents. By examining what influenced and engaged each partner, this case study offers an opportunity to better understand the motivations of diverse stakeholders in a "post-custodial era" project that challenges traditional practices of custody, control, and access.

  13. Relating interesting quantitative time series patterns with text events and text features

    Science.gov (United States)

    Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.

    2013-12-01

    In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other

  14. Standardization Documents

    Science.gov (United States)

    2011-08-01

    Specifications and Standards; Guide Specifications; CIDs; and NGSs . Learn. Perform. Succeed. STANDARDIZATION DOCUMENTS Federal Specifications Commercial...national or international standardization document developed by a private sector association, organization, or technical society that plans ...Maintain lessons learned • Examples: Guidance for application of a technology; Lists of options Learn. Perform. Succeed. DEFENSE HANDBOOK

  15. Term Familiarity to indicate Perceived and Actual Difficulty of Text in Medical Digital Libraries.

    Science.gov (United States)

    Leroy, Gondy; Endicott, James E

    2011-10-01

    With increasing text digitization, digital libraries can personalize materials for individuals with different education levels and language skills. To this end, documents need meta-information describing their difficulty level. Previous attempts at such labeling used readability formulas but the formulas have not been validated with modern texts and their outcome is seldom associated with actual difficulty. We focus on medical texts and are developing new, evidence-based meta-tags that are associated with perceived and actual text difficulty. This work describes a first tag, term familiarity , which is based on term frequency in the Google corpus. We evaluated its feasibility to serve as a tag by looking at a document corpus (N=1,073) and found that terms in blogs or journal articles displayed unexpected but significantly different scores. Term familiarity was then applied to texts and results from a previous user study (N=86) and could better explain differences for perceived and actual difficulty.

  16. VizieR Online Data Catalog: FADO code (Gomes+, 2017)

    Science.gov (United States)

    Gomes, J. M.; Papaderos, P.

    2017-03-01

    FADO comes from the Latin word "fatum" that means fate or destiny. It is also a well known genre of Portuguese music, and by choosing this acronym for this spectral synthesis tool we would like to pay tribute to Portugal. The main goal of FADO is to explore the star-formation and chemical enrichment history (the "Fado") of galaxies based on two hitherto unique elements in spectral fitting models: a) self-consistency between the best-fitting star formation history (SFH) and the nebular characteristics of a galaxy (e.g., hydrogen Balmer-line luminosities and equivalent widths; shape of the nebular continuum, including the Balmer and Paschen discontinuity) and b) genetic optimization and artificial intelligence algorithms. This document is part of the FADO v.1 distribution package, which contains two different ascii files, ReadMe and Read_F, and one tarball archive FADOv1.tar.gz. FADOv1.tar.gz contains the binary (executable) compiled in both OpenSuSE 13.2 64bit LINUX (FADO) and MAC OS X (FADO_MACOSX). The former is compatible with most LINUX distributions, while the latter was only tested for Yosemite 10.10.3. It contains the configuration files for running FADO: FADO.config and PLOT.config, as well as the "Simple Stellar Population" (SSP) base library with the base file list Base.BC03.L, the FADO v.1 short manual Read_F and this file (in the ReadMe directory) and, for testing purposes, three characteristic de-redshifted spectra from SDSS-DR7 in ascii format, corresponding to a star-forming (spec1.txt), composite (spec2.txt) and LINER (spec3.txt) galaxy. Auxiliary files needed for execution of FADO (.HIfboundem.ascii, .HeIIfbound.ascii, .HeIfboundem.ascii, grfont.dat and grfont.txt) are also included in the tarball. By decompressing the tarball the following six directories are created: input, output, plots, ReadMe, SSPs and tables (see below for a brief explanation). (2 data files).

  17. Utopia documents: linking scholarly literature with research data.

    Science.gov (United States)

    Attwood, T K; Kell, D B; McDermott, P; Marsh, J; Pettifer, S R; Thorne, D

    2010-09-15

    In recent years, the gulf between the mass of accumulating-research data and the massive literature describing and analyzing those data has widened. The need for intelligent tools to bridge this gap, to rescue the knowledge being systematically isolated in literature and data silos, is now widely acknowledged. To this end, we have developed Utopia Documents, a novel PDF reader that semantically integrates visualization and data-analysis tools with published research articles. In a successful pilot with editors of the Biochemical Journal (BJ), the system has been used to transform static document features into objects that can be linked, annotated, visualized and analyzed interactively (http://www.biochemj.org/bj/424/3/). Utopia Documents is now used routinely by BJ editors to mark up article content prior to publication. Recent additions include integration of various text-mining and biodatabase plugins, demonstrating the system's ability to seamlessly integrate on-line content with PDF articles. http://getutopia.com.

  18. Public census data on CD-ROM at Lawrence Berkeley Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Merrill, D.W.

    1992-10-01

    The Comprehensive Epidemiologic Data Resource (CEDR) and Populations at Risk to Environmental Pollution (PAREP) projects, of the Information and Computing Sciences Division (ICSD) at Lawrence Berkeley Laboratory (LBL), are using public socio-economic and geographic data files which are available to CEDR and PAREP collaborators via LBL's computing network. At this time 70 CD-ROM diskettes (approximately 36 gigabytes) are on line via the Unix file server cedrcd. lbl. gov. Most of the files are from the US Bureau of the Census, and most pertain to the 1990 Census of Population and Housing. All the CD-ROM diskettes contain documentation in the form of ASCII text files. Printed documentation for most files is available for inspection at University of California Data and Technical Assistance (UC DATA), or the UC Documents Library. Many of the CD-ROM diskettes distributed by the Census Bureau contain software for PC compatible computers, for easily accessing the data. Shared access to the data is maintained through a collaboration among the CEDR and PAREP projects at LBL, and UC DATA, and the UC Documents Library. Via the Sun Network File System (NFS), these data can be exported to Internet computers for direct access by the user's application program(s).

  19. Public census data on CD-ROM at Lawrence Berkeley Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Merrill, D.W.

    1992-10-01

    The Comprehensive Epidemiologic Data Resource (CEDR) and Populations at Risk to Environmental Pollution (PAREP) projects, of the Information and Computing Sciences Division (ICSD) at Lawrence Berkeley Laboratory (LBL), are using public socio-economic and geographic data files which are available to CEDR and PAREP collaborators via LBL`s computing network. At this time 70 CD-ROM diskettes (approximately 36 gigabytes) are on line via the Unix file server cedrcd. lbl. gov. Most of the files are from the US Bureau of the Census, and most pertain to the 1990 Census of Population and Housing. All the CD-ROM diskettes contain documentation in the form of ASCII text files. Printed documentation for most files is available for inspection at University of California Data and Technical Assistance (UC DATA), or the UC Documents Library. Many of the CD-ROM diskettes distributed by the Census Bureau contain software for PC compatible computers, for easily accessing the data. Shared access to the data is maintained through a collaboration among the CEDR and PAREP projects at LBL, and UC DATA, and the UC Documents Library. Via the Sun Network File System (NFS), these data can be exported to Internet computers for direct access by the user`s application program(s).

  20. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1960-01-01

    The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [es

  1. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1960-01-01

    The texts of the relationship agreements which the Agency has concluded with the specialized agencies listed below, together with the respective protocols authenticating them, are reproduced in this document in the order which the agreements entered into force, for the information of all Members of the Agency [fr

  2. Transcription of Spanish Historical Handwritten Documents with Deep Neural Networks

    Directory of Open Access Journals (Sweden)

    Emilio Granell

    2018-01-01

    Full Text Available The digitization of historical handwritten document images is important for the preservation of cultural heritage. Moreover, the transcription of text images obtained from digitization is necessary to provide efficient information access to the content of these documents. Handwritten Text Recognition (HTR has become an important research topic in the areas of image and computational language processing that allows us to obtain transcriptions from text images. State-of-the-art HTR systems are, however, far from perfect. One difficulty is that they have to cope with image noise and handwriting variability. Another difficulty is the presence of a large amount of Out-Of-Vocabulary (OOV words in ancient historical texts. A solution to this problem is to use external lexical resources, but such resources might be scarce or unavailable given the nature and the age of such documents. This work proposes a solution to avoid this limitation. It consists of associating a powerful optical recognition system that will cope with image noise and variability, with a language model based on sub-lexical units that will model OOV words. Such a language modeling approach reduces the size of the lexicon while increasing the lexicon coverage. Experiments are first conducted on the publicly available Rodrigo dataset, which contains the digitization of an ancient Spanish manuscript, with a recognizer based on Hidden Markov Models (HMMs. They show that sub-lexical units outperform word units in terms of Word Error Rate (WER, Character Error Rate (CER and OOV word accuracy rate. This approach is then applied to deep net classifiers, namely Bi-directional Long-Short Term Memory (BLSTMs and Convolutional Recurrent Neural Nets (CRNNs. Results show that CRNNs outperform HMMs and BLSTMs, reaching the lowest WER and CER for this image dataset and significantly improving OOV recognition.

  3. The phenomenon of soccer in some literary texts: Classical and contemporary

    Directory of Open Access Journals (Sweden)

    Victor Gil Castañeda

    2009-11-01

    Full Text Available This article talks about how in the literature history, many authors had shown a profound interest in describing the phenomenon of football soccer, one of the most popular sports on earth. We can see this aspect in pre-Hispanic texts like: Popol Vuh, also in some modern intellectuals like Eduardo Galeano (Uruguayan, in his book: El footboy a sol y sombra. The document also mentioned other literary texts which prominent figures, narrative atmospheres, sail in the aesthetic description of the football

  4. Benchmarking infrastructure for mutation text mining.

    Science.gov (United States)

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  5. Benchmarking infrastructure for mutation text mining

    Science.gov (United States)

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  6. Intelligent bar chart plagiarism detection in documents.

    Science.gov (United States)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Rehman, Amjad; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

    2014-01-01

    This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts.

  7. Intelligent Bar Chart Plagiarism Detection in Documents

    Science.gov (United States)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Alkawaz, Mohammed Hazim; Saba, Tanzila; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah

    2014-01-01

    This paper presents a novel features mining approach from documents that could not be mined via optical character recognition (OCR). By identifying the intimate relationship between the text and graphical components, the proposed technique pulls out the Start, End, and Exact values for each bar. Furthermore, the word 2-gram and Euclidean distance methods are used to accurately detect and determine plagiarism in bar charts. PMID:25309952

  8. MANAGING HUMAN FACTORS IN IMPLEMENTING ELECTRONIC DOCUMENT SYSTEM IN THE PUBLIC SECTOR

    Directory of Open Access Journals (Sweden)

    TOMS LEIKUMS

    2012-05-01

    Full Text Available Document management underlies the activities of almost every organization. Correctly managed correspondence and organized document circulation characterize successful performance particularly in the public sector organizations. Even though production of documents itself is not the main task of governmental institutions, document creation and processing are crucial processes for the provision of basic functions in public sector. In the 21st century it gets more important to use the new possibilities offered by modern technologies, including electronic document management. Public sector itself is a heavy bureaucratic apparatus in the need of elasticity and ability to change its working processes and habits in order to gradually switch to the digital environment. Western European countries have already turned to electronic document management whilst most of the Eastern European countries, including Latvia, have just recently started a gradual electronization of document circulation. When implementing electronic document management systems in the public sector organizations, it often comes to resistance of the staff and unwillingness to change the accustomed methods of work – paper format document circulation. Both lower level staff and higher level managers put obstacles to electronic document management. In this article author inspects cases of successful practice and analyses possible action mechanisms that could convince public sector personnel of advantages of electronic document circulation and prepare them to switch to work with digital documents.

  9. (Co-)constructing specialized knowledge - Internet texts as a case in point

    DEFF Research Database (Denmark)

    Kampf, Constance

    . Working from Bazerman's definition of writing as social action, and combining it withWenger's definition for communities of practice which relies on participation and reificationoccurring in conjunction with written documents, we can understand Internet texts as reificationsin an ongoing social action...

  10. A função arquivística de avaliação documental no software livre de gestão documental Nuxeo

    Directory of Open Access Journals (Sweden)

    Sérgio Renato Lampert

    2017-04-01

    Full Text Available Apresenta o estudo do Software Livre de Gestão Documental Nuxeo frente à implementação da função arquivística de avaliação documental. A análise da ferramenta possibilitou verificar o procedimento de instalação, apontando dificuldades e barreiras para os profissionais da informação que desejam instalar a solução. Considerando os pressupostos teóricos acerca da avaliação documental, buscou-se analisar a empregabilidade desta, na ferramenta, a fim de validar a aplicação da teoria das três idades. O exame das funcionalidades do Nuxeo permitiu identificar que o software não aplica de modo automatizado a função de avaliação documental. Apesar de não ser uma solução arquivística, conclui-se que o Nuxeo pode ser utilizado para a gestão de documentos digitais, uma vez que apresenta em sua estrutura metadados para avaliação documental. A análise de softwares de gestão documental, sob o viés arquivístico, possibilita aproximar o arquivista das Tecnologias da Informação e garantir o acesso futuro às informações em meio digital.

  11. Document Examination: Applications of Image Processing Systems.

    Science.gov (United States)

    Kopainsky, B

    1989-12-01

    Dealing with images is a familiar business for an expert in questioned documents: microscopic, photographic, infrared, and other optical techniques generate images containing the information he or she is looking for. A recent method for extracting most of this information is digital image processing, ranging from the simple contrast and contour enhancement to the advanced restoration of blurred texts. When combined with a sophisticated physical imaging system, an image pricessing system has proven to be a powerful and fast tool for routine non-destructive scanning of suspect documents. This article reviews frequent applications, comprising techniques to increase legibility, two-dimensional spectroscopy (ink discrimination, alterations, erased entries, etc.), comparison techniques (stamps, typescript letters, photo substitution), and densitometry. Computerized comparison of handwriting is not included. Copyright © 1989 Central Police University.

  12. A two-sided academic landscape: snapshot of highly-cited documents in Google Scholar (1950-2013

    Directory of Open Access Journals (Sweden)

    Alberto Martín-Martín

    2016-12-01

    Full Text Available The main objective of this paper is to identify and define the core characteristics of the set of highly-cited documents in Google Scholar (document types, language, free availability, sources, and number of versions, on the hypothesis that the wide coverage of this search engine may provide a different portrait of these documents with respect to that offered by traditional bibliographic databases. To do this, a query per year was carried out from 1950 to 2013 identifying the top 1,000 documents retrieved from Google Scholar and obtaining a final sample of 64,000 documents, of which 40% provided a free link to full-text. The results obtained show that the average highly-cited document is a journal or book article (62% of the top 1% most cited documents of the sample, written in English (92.5% of all documents and available online in PDF format (86.0% of all documents. Yet, the existence of errors should be noted, especially when detecting duplicates and linking citations properly. Nonetheless, the fact that the study focused on highly cited papers minimizes the effects of these limitations. Given the high presence of books and, to a lesser extent, of other document types (such as proceedings or reports, the present research concludes that the Google Scholar data offer an original and different vision of the most influential academic documents (measured from the perspective of their citation count, a set composed not only of strictly scientific material (journal articles but also of academic material in its broadest sense.

  13. Information Technology Act 2000 in India - Authentication of E-Documents

    Directory of Open Access Journals (Sweden)

    R. G. Pawar

    2007-06-01

    Full Text Available The Information Technology Act 2000 has enacted in India on 9th June 2000. This Act has mentioned provision of authentication of electronic document. It is the need of hour at that time that such provision is needed in the Indian Law system, especially for electronic commerce and electronic governance. Electronic commerce”, which involve the use of alternatives to paper based methods of communication and storage information. To do electronic commerce there should be authentication of particular document. The working of internet is the documents are traveling in terms of bits from one destination to other destination, through various media like – Co-axial cable, fiber optic, satellite etc. While traveling this document there is probability of making changes in that document by any third party is high or some document may get changed due to noise/disturbance in communication media. This Act required to provide legal recognition carried out by means of electronic data interchange and other means of electronic communication.In this paper researchers studied technological aspects of Information Technology Act 2000 like hash function, encryption, decryption, public key, private key etc. and its process. This paper gives details about certifying authority in detail. There should be some mechanism that will take care of document, that what ever the document is received should be the authentic one and it would not get changed in any manner due to any cause.

  14. Synthesis document on the long time behavior of packages: operational document ''bituminous'' 2204

    International Nuclear Information System (INIS)

    Tiffreau, C.

    2004-09-01

    This document is realized in the framework of the law of 1991 on the radioactive wastes management. The 2004 synthesis document on long time behavior of bituminous sludges packages is constituted by two documents, the reference document and the operational document. This paper presents the operational model describing the water alteration of the packages and the associated radioelements release, as the gas term source and the swelling associated to the self-irradiation and the bituminous radiolysis. (A.L.B.)

  15. DCHAIN: A user-friendly computer program for radioactive decay and reaction chain calculations

    International Nuclear Information System (INIS)

    East, L.V.

    1994-05-01

    A computer program for calculating the time-dependent daughter populations in radioactive decay and nuclear reaction chains is described. Chain members can have non-zero initial populations and be produced from the preceding chain member as the result of radioactive decay, a nuclear reaction, or both. As presently implemented, chains can contain up to 15 members. Program input can be supplied interactively or read from ASCII data files. Time units for half-lives, etc. can be specified during data entry. Input values are verified and can be modified if necessary, before used in calculations. Output results can be saved in ASCII files in a format suitable for including in reports or other documents. The calculational method, described in some detail, utilizes a generalized form of the Bateman equations. The program is written in the C language in conformance with current ANSI standards and can be used on multiple hardware platforms

  16. The Janus Head Article - On Quality in the Documentation Process

    Directory of Open Access Journals (Sweden)

    Henrik Andersen

    2006-03-01

    Full Text Available The god Janus in Greek mythology was a two-faced god; each face had its own view of the world. Our idea behind the Janus Head article is to give you two different and maybe even contradicting views on a certain topic. In this issue the topic is quality in the documentation process. In the first half of this issue’s Janus Head Article translators from the international company Grundfos give us their view of quality and how quality is managed in the documentation process at Grundfos. In the second half of the Janus Head Article scholars from the University of Southern Denmark describe and discuss quality in the documentation process at Grundfos from a researcher’s point of view.

  17. The Janus Head Article - On Quality in the Documentation Process

    Directory of Open Access Journals (Sweden)

    Henrik Andersen

    2012-08-01

    Full Text Available The god Janus in Greek mythology was a two-faced god; each face had its own view of the world. Our idea behind the Janus Head article is to give you two different and maybe even contradicting views on a certain topic. In this issue the topic is quality in the documentation process. In the first half of this issue’s Janus Head Article translators from the international company Grundfos give us their view of quality and how quality is managed in the documentation process at Grundfos. In the second half of the Janus Head Article scholars from the University of Southern Denmark describe and discuss quality in the documentation process at Grundfos from a researcher’s point of view.

  18. A survey of text clustering techniques used for web mining

    Directory of Open Access Journals (Sweden)

    Dan MUNTEANU

    2005-12-01

    Full Text Available This paper contains an overview of basic formulations and approaches to clustering. Then it presents two important clustering paradigms: a bottom-up agglomerative technique, which collects similar documents into larger and larger groups, and a top-down partitioning technique, which divides a corpus into topic-oriented partitions.

  19. Quality improvement in clinical documentation: does clinical governance work?

    Directory of Open Access Journals (Sweden)

    Dehghan M

    2013-12-01

    Full Text Available Mahlegha Dehghan,1 Dorsa Dehghan,2 Akbar Sheikhrabori,3 Masoume Sadeghi,4 Mehrdad Jalalian5 1Department of Medical Surgical Nursing, School of Nursing and Midwifery, Kerman University of Medical Sciences, Kerman, 2Department of Pediatric Nursing, School of Nursing and Midwifery, Islamic Azad University Kerman Branch, Kerman, 3Department of Medical Surgical Nursing, School of Nursing and Midwifery, Kerman University of Medical Sciences, Kerman, 4Research Center for Modeling in Health, Institute of Futures Studies in Health, Kerman University of Medical Sciences, Kerman, 5Electronic Physician Journal, Mashhad, Iran Introduction: The quality of nursing documentation is still a challenge in the nursing profession and, thus, in the health care industry. One major quality improvement program is clinical governance, whose mission is to continuously improve the quality of patient care and overcome service quality problems. The aim of this study was to identify whether clinical governance improves the quality of nursing documentation. Methods: A quasi-experimental method was used to show nursing documentation quality improvement after a 2-year clinical governance implementation. Two hundred twenty random nursing documents were assessed structurally and by content using a valid and reliable researcher made checklist. Results: There were no differences between a nurse's demographic data before and after 2 years (P>0.05 and the nursing documentation score did not improve after a 2-year clinical governance program. Conclusion: Although some efforts were made to improve nursing documentation through clinical governance, these were not sufficient and more attempts are needed. Keywords: nursing documentation, clinical governance, quality improvement, nursing record

  20. Documenting costs and yield of crops of organic origin

    Directory of Open Access Journals (Sweden)

    J.P. Melnychuk

    2016-06-01

    Full Text Available The article focuses on the study of primary cost accounting and output of organic crop production. The article has also agreed the key issues that ensure in the primary accounting of organic crop production. For the survey we have used such general scientific methods as induction and deduction, dialectic, historical and systematic methods and some specific methods of accounting which include documentation, inventory, assessment, calculation, accounting records, double entry, balance sheet and financial statements. . As for the documentation of costs and yield of crops of organic origin, it should be noted that documentation is an important method of accounting as it’s the basis of initial observation of commercial operations and it’s a prerequisite for their reflection in accounting. The article has highlighted the features of documenting the posting of production costs and crop production of organic origin, and has also studied the order of registration of land in the operating lease for the production of organic products. The author submits the suggestions for improvement of documenting costs and yields of organic crop production in order to develop reliable information about the costs of production and the grown crop of organic origin for management decision-making.

  1. A Sample Typology of Texts in Corporate Discourse

    Directory of Open Access Journals (Sweden)

    Jacek Kołata

    2009-11-01

    Full Text Available The subject matter of this article is to present a working typology of different texts existing in corporate discourse. The data for the following analysis are drawn from various groups of documents existing in Nestle Corporation. The division into categories was possible after highlighting the most discriminative features of the texts under investigation. Moreover, it gives me the possibility to reveal how texts are shaped by contexts in which they exist. Bearing the above in mind, we must not forget that written utterances are always influenced by different but closely related parameters, such as a sender, a recipient, a particular incident and an aim of the conversation – to be more precise they cannot exist independently. This paper attempts at pointing out the weakness and merits of the corporate discourse communication system in the described company and by doing so, facilitate the flow of information among all departments, employees and factories.

  2. Information Types in Nonmimetic Documents: A Review of Biddle's Wipe-Clean Slate (Understanding Documents).

    Science.gov (United States)

    Mosenthal, Peter B.; Kirsch, Irwin S.

    1991-01-01

    Describes how the 16 permanent lists used by a first grade reading teacher (and mother of 6) to manage the household represents the whole range of documents covered in the 3 major types of documents: matrix documents, graphic documents, and locative documents. Suggests class activities to clarify students' understanding of the information in…

  3. METHOD OF RARE TERM CONTRASTIVE EXTRACTION FROM NATURAL LANGUAGE TEXTS

    Directory of Open Access Journals (Sweden)

    I. A. Bessmertny

    2017-01-01

    Full Text Available The paper considers a problem of automatic domain term extraction from documents corpus by means of a contrast collection. Existing contrastive methods successfully extract often used terms but mishandle rare terms. This could yield poorness of the resulting thesaurus. Assessment of point-wise mutual information is one of the known statistical methods of term extraction and it finds rare terms successfully. Although, it extracts many false terms at that. The proposed approach consists of point-wise mutual information application for rare terms extraction and filtering of candidates by criterion of joint occurrence with the other candidates. We build “documents-by-terms” matrix that is subjected to singular value decomposition to eliminate noise and reveal strong interconnections. Then we pass on to the resulting matrix “terms-by-terms” that reproduces strength of interconnections between words. This approach was approved on a documents collection from “Geology” domain with the use of contrast documents from such topics as “Politics”, “Culture”, “Economics” and “Accidents” on some Internet resources. The experimental results demonstrate operability of this method for rare terms extraction.

  4. Technical approach document

    International Nuclear Information System (INIS)

    1988-04-01

    This document describes the general technical approaches and design criteria adopted by the US Department of Energy (DOE) in order to implement Remedial Action Plans (RAPs) and final designs that comply with EPS standards. This document is a revision to the original document. Major revisions were made to the sections in riprap selection and sizing, and ground-water; only minor revisions were made to the remainder of the document. The US Nuclear Regulatory Commission (NRC) has prepared a Standard Review Plan (NRC-SRP) which describes factors to be considered by the NRC in approving the RAP. Sections 3.0, 4.0, 5.0, and 7.0 of this document are arranged under the same headings as those used in the NRC-SRP. This approach is adopted in order to facilitate joint use of the documents. Section 2.0 (not included in the NRC-SRP) discusses design considerations; Section 3.0 describes surface-water hydrology and erosion control; Section 4.0 describes geotechnical aspects of pile design; Section 5.0 discusses the Alternate Site Selection Process; Section 6.0 deals with radiological issues (in particular, the design of the radon barrier); Section 7.0 discusses protection of groundwater resources; and Section 8.0 discusses site design criteria for the RAC

  5. Availability of Ada and C++ Compilers, Tools, Education and Training

    Science.gov (United States)

    1991-07-01

    executable mini-specs, to support import of existing code. Automated database population/change propagation. 9. Documentation generation: via FrameMaker . 10...formats. 12. Links to other tools: i. Atherton’s Software Backplane. ii. 4GLS iii. Interleaf and FrameMaker publishing. 13. Output formats: PostScript...by end 󈨟. 11. Output formats: ASCII, PostScript, Interleaf, HPGL, Troff, nroff, FrameMaker , WordPerfect. 12. User interface: Menu and mouse

  6. How Are Researching and Reading Interwieved during Retrieval from Hierarchically Structured Documents?

    DEFF Research Database (Denmark)

    Hertzum, Morten; Lalmas, M.; Frøkjær, Erik

    2001-01-01

    Effective use of information retrieval systems requires that users know when to – temporarily – cease searching to do some reading and where to start reading. In hierarchically structured documents, users can to some extent interchange searching and reading by entering the text at different levels...... information retrieval systems could exploit document structure to return the best points to support reading, rather than merely hits...

  7. The Use of Speech Technology to Protect the Document Turnover

    Directory of Open Access Journals (Sweden)

    Alexandr M. Alyushin

    2017-06-01

    Full Text Available The wide current paper documents implementation in practice workflows are shown. The basic aspects of document protection related to the protection of their content and legal components are underlined. For contextual component assigned semantic information aspect of the document is considered. For legal component attributed facts and conditions for the creation, approval, negotiation of the document to specific persons is viewed. The documents protection problem importance is shown in connection with possible terrorist threats. The importance of such factor as the time of fraud detection towards the efficiency of documents protection is shown. The fraud detection time requirements for documents of different nature – financial, legal, management is analyzed. The documents used for the operational management of dangerous objects is point out as the most sensitive to the falsification. It is shown that their deliberate falsification can lead to accidents and technogenic catastrophes and human casualties. A comparative analysis of currently used protecting documents methods are presented. Biometric and non-biometric methods of documents protection are point out.Theanalysis of their short comings are given. The conclusion about the prospects of document protection on the basis of the voice signature technology are done. The basic steps of voice information processing in the implementation of this technology are analyzed. The software that implements a documents counterfeiting new protection technology is proposed. The technology is based on the audiomarkers usage at the end of the document, which contains a general information about it. The technology is applicable to the wide range of documents such as financial and valuable papers, contracts, etc. One of the most important advantages of this technology is that any changes in the document can not be done without the author of the document because audiomarker keeps the biometric data of the person

  8. DOCUMENT IMAGE REGISTRATION FOR IMPOSED LAYER EXTRACTION

    Directory of Open Access Journals (Sweden)

    Surabhi Narayan

    2017-02-01

    Full Text Available Extraction of filled-in information from document images in the presence of template poses challenges due to geometrical distortion. Filled-in document image consists of null background, general information foreground and vital information imposed layer. Template document image consists of null background and general information foreground layer. In this paper a novel document image registration technique has been proposed to extract imposed layer from input document image. A convex polygon is constructed around the content of the input and the template image using convex hull. The vertices of the convex polygons of input and template are paired based on minimum Euclidean distance. Each vertex of the input convex polygon is subjected to transformation for the permutable combinations of rotation and scaling. Translation is handled by tight crop. For every transformation of the input vertices, Minimum Hausdorff distance (MHD is computed. Minimum Hausdorff distance identifies the rotation and scaling values by which the input image should be transformed to align it to the template. Since transformation is an estimation process, the components in the input image do not overlay exactly on the components in the template, therefore connected component technique is applied to extract contour boxes at word level to identify partially overlapping components. Geometrical features such as density, area and degree of overlapping are extracted and compared between partially overlapping components to identify and eliminate components common to input image and template image. The residue constitutes imposed layer. Experimental results indicate the efficacy of the proposed model with computational complexity. Experiment has been conducted on variety of filled-in forms, applications and bank cheques. Data sets have been generated as test sets for comparative analysis.

  9. Distributed and Conditional Documents: Conceptualizing Bibliographical Alterities

    Directory of Open Access Journals (Sweden)

    Johanna Drucker

    2014-11-01

    Full Text Available To conceptualize a future history of the book we have to recognize that our understanding of the bibliographical object of the past is challenged by the ontologically unbound, distributed, digital, and networked conditions of the present. As we draw on rich intellectual traditions, we must keep in view the need to let go of the object-centered approach that is at the heart of book history. My argument begins, therefore, with a few assertions. First, that we have much to learn from the scholarship on Old and New World contact that touches on bibliography, document studies, and book history for formulating a non-object centered conception of what a book is. Second, that the insights from these studies can be usefully combined with a theory of the “conditional” document to develop the model of the kinds of distributed artifacts we encounter on a daily basis in the networked conditions of current practices. Finally, I would suggest that this model provides a different conception of artifacts (books, documents, works of textual or graphic art, one in which reception is production and therefore all materiality is subject to performative engagement within varied, and specific, conditions of encounter.

  10. Sourcing in Professional Education: Do Text Factors Make Any Difference?

    Science.gov (United States)

    Bråten, Ivar; Strømsø, Helge I.; Andreassen, Rune

    2016-01-01

    The present study investigated the extent to which the text factors of source salience and emphasis on risk might influence readers' attention to and use of source information when reading single documents to make behavioral decisions on controversial health-related issues. Participants (n = 259), who were attending different bachelor-level…

  11. Enterprise Document Management

    Data.gov (United States)

    US Agency for International Development — The function of the operation is to provide e-Signature and document management support for Acquisition and Assisitance (A&A) documents including vouchers in...

  12. A review of technology and trends in document delivery services

    Energy Technology Data Exchange (ETDEWEB)

    Bourne, C P [DIALOG Information Services, Inc., Palo Alto, CA (United States)

    1990-05-01

    This paper reviews the major lines of technical development being pursued to extend or replace traditional inter-library loan and photocopy service and to facilitate the delivery of source documents to individual end users. Examples of technical approaches discussed are: (1) the inclusion of full text and image data in central online systems; (2) image workstations such as the ADONIS and UMI systems; and (3) the use of electronic networks for document ordering and delivery. Some consideration is given to the policy implications for libraries and information systems. (author). 11 tabs.

  13. A review of technology and trends in document delivery services

    International Nuclear Information System (INIS)

    Bourne, C.P.

    1990-05-01

    This paper reviews the major lines of technical development being pursued to extend or replace traditional inter-library loan and photocopy service and to facilitate the delivery of source documents to individual end users. Examples of technical approaches discussed are: 1) the inclusion of full text and image data in central online systems; 2) image workstations such as the ADONIS and UMI systems; and 3) the use of electronic networks for document ordering and delivery. Some consideration is given to the policy implications for libraries and information systems. (author). 11 tabs

  14. A DOCUMENT PAUL AUSTER’S NEW YORK TRILOGY

    Directory of Open Access Journals (Sweden)

    Natalia N. Smirnova

    2017-03-01

    Full Text Available The article deals with the way a literary work “creates” a document out of itself, on the example of Paul Auster’s novels. A document here is the report of a character, a private detective who is watching another character (a writer but also the book of a fictional writer who is writing a story of the detective who is watching him, and eventually the book about this whole story. In this case, the search for the other, watching him, is inevitably associated with the search for oneself, self-observation. Biography becomes autobiography, e.g. a document rather than a narrative based on a document. This story becomes projected on the story of Don Quixote (of which “some” Paul Auster, a fictional writer, is writing an essay. The Other is a landmark in the vast desert of fictional worlds where Paul Auster’s Don Quixote wanders alongside other characters of the trilogy. The author may not return from his endless journey through imaginary worlds; his life does not belong to either real life or fiction. He gives life to his characters while remaining invisible himself. Paul Auster’s The New York Trilogy explores such existential situation where the only evidence of the author’s life is a document left by his character. The author leaves a documentary record of a kind about his own existence. It this sense, literature is a document of life and of the endless search for a reason to the existence of an individual who, being not equal to him- or herself, is always the other and never a type or a template.

  15. CRITICISM, ADAPTATION AND ORGANIZATION IN THE COLLABORATIVE CONSTRUCTION OF DOCUMENTS IN THE CLOUDS

    Directory of Open Access Journals (Sweden)

    Raquel Franco Santos

    2016-07-01

    Full Text Available Working with text in the digital age brings several challenges for researchers from Computing, Education, and Linguistics, as collaborative writing on the Web. This article presents aspects related to thinking and doing within this context, working with humanistic issues related to Adorno (Criticism and Piaget (Constructivism, the vision of paragraph as unit of text, and technologies for producing Web documents in collaborative learning environments (Oriented Architecture to Service in the Cloud. It is proposed, based on research related to collaborative writing on the Web, a model-driven service for collaborative construction documents in clouds. This paper also presents a tool (CCDC-TEO that implements the proposed model and an example of its application. The results demonstrate the validity of this model.

  16. Computer-assisted documentation: One device to keep your nose above the water

    International Nuclear Information System (INIS)

    Church, L.B.

    1980-01-01

    Because of the large number of student operators and trainees at the Reed College Reactor Facility, there is a large demand for access to our documentation. In the past the standard mimeograph approach has been used to make available Tech Specs, Administrative Procedures, Emergency Plans, etc., etc. However, the frequency of change in these documents (often relatively minor in nature) causes an entire document to be outdated. To provide easier student access, to help keep the documentation up to date and to do so at a minimum cost, we have started using the text editor on our computer. On the whole the experiment has been very well received; some of the more important pro's and con's will be discussed. (author)

  17. The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations

    International Nuclear Information System (INIS)

    1969-01-01

    The text of the Agency's agreement for co-operation with the Organization of African Unity (OAU) is reproduced in this document for the information of all Members. The agreement entered into force on 26 March 1969

  18. INFORMATION SYSTEM OF AUTOMATION OF PREPARATION EDUCATIONAL PROCESS DOCUMENTS

    Directory of Open Access Journals (Sweden)

    V. A. Matyushenko

    2016-01-01

    Full Text Available Information technology is rapidly conquering the world, permeating all spheres of human activity. Education is not an exception. An important direction of information of education is the development of university management systems. Modern information systems improve and facilitate the management of all types of activities of the institution. The purpose of this paper is development of system, which allows automating process of formation of accounting documents. The article describes the problem of preparation of the educational process documents. Decided to project and create the information system in Microsoft Access environment. The result is four types of reports obtained by using the developed system. The use of this system now allows you to automate the process and reduce the effort required to prepare accounting documents. All reports was implement in Microsoft Excel software product and can be used for further analysis and processing.

  19. Tank Monitoring and Document control System (TMACS) As Built Software Design Document

    International Nuclear Information System (INIS)

    GLASSCOCK, J.A.

    2000-01-01

    This document describes the software design for the Tank Monitor and Control System (TMACS). This document captures the existing as-built design of TMACS as of November 1999. It will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions

  20. A database for TMT interface control documents

    Science.gov (United States)

    Gillies, Kim; Roberts, Scott; Brighton, Allan; Rogers, John

    2016-08-01

    The TMT Software System consists of software components that interact with one another through a software infrastructure called TMT Common Software (CSW). CSW consists of software services and library code that is used by developers to create the subsystems and components that participate in the software system. CSW also defines the types of components that can be constructed and their roles. The use of common component types and shared middleware services allows standardized software interfaces for the components. A software system called the TMT Interface Database System was constructed to support the documentation of the interfaces for components based on CSW. The programmer describes a subsystem and each of its components using JSON-style text files. A command interface file describes each command a component can receive and any commands a component sends. The event interface files describe status, alarms, and events a component publishes and status and events subscribed to by a component. A web application was created to provide a user interface for the required features. Files are ingested into the software system's database. The user interface allows browsing subsystem interfaces, publishing versions of subsystem interfaces, and constructing and publishing interface control documents that consist of the intersection of two subsystem interfaces. All published subsystem interfaces and interface control documents are versioned for configuration control and follow the standard TMT change control processes. Subsystem interfaces and interface control documents can be visualized in the browser or exported as PDF files.

  1. INTEGRATION OF COMPUTER TECHNOLOGIES SMK: AUTOMATION OF THE PRODUCTION CERTIFICA-TION PROCEDURE AND FORMING OF SHIPPING DOCUMENTS

    Directory of Open Access Journals (Sweden)

    S. A. Pavlenko

    2009-01-01

    Full Text Available Integration of informational computer technologies allowed to reorganize and optimize some processes due to decrease of circulation of documents, unification of documentation forms and others.

  2. ERRORS AND DIFFICULTIES IN TRANSLATING LEGAL TEXTS

    Directory of Open Access Journals (Sweden)

    Camelia, CHIRILA

    2014-11-01

    Full Text Available Nowadays the accurate translation of legal texts has become highly important as the mistranslation of a passage in a contract, for example, could lead to lawsuits and loss of money. Consequently, the translation of legal texts to other languages faces many difficulties and only professional translators specialised in legal translation should deal with the translation of legal documents and scholarly writings. The purpose of this paper is to analyze translation from three perspectives: translation quality, errors and difficulties encountered in translating legal texts and consequences of such errors in professional translation. First of all, the paper points out the importance of performing a good and correct translation, which is one of the most important elements to be considered when discussing translation. Furthermore, the paper presents an overview of the errors and difficulties in translating texts and of the consequences of errors in professional translation, with applications to the field of law. The paper is also an approach to the differences between languages (English and Romanian that can hinder comprehension for those who have embarked upon the difficult task of translation. The research method that I have used to achieve the objectives of the paper was the content analysis of various Romanian and foreign authors' works.

  3. Management of technical documents: a projection at the University of Zulia

    Directory of Open Access Journals (Sweden)

    Ana Judith Paredes Chacin

    2015-11-01

    Full Text Available Objective. This paper analyze comprehensively and systematically the principles of organization and technical procedure, which support document management based on the use of information technologies. Method. We developed a study based on the documentary descriptive method in the context of the Dirección de Infraestructura (Dinfra from the Universidad del Zulia. Results. We find evidence of efficiency in the processes that support the management of technical documents: Plans, metrics and memories generated by the Dinfra. Conclusion. The conceptual basis of organizational and technical archives contribute to the systematization, shelter save and documental preservation, and ensure the timely retrieval of technical information for the management of the University of Zulia.

  4. The Texts of the Agency's Relationship Agreements with Specialized Agencies

    International Nuclear Information System (INIS)

    1988-03-01

    The text of the relationship agreement with the Agency has concluded with the United Nations Industrial Development Organization, together with the protocol regarding its entry into force, is reproduced in this document for the information of all Members of the Agency. The agreement entered into force on 9 October 1987 pursuant to Article 10

  5. Multi-font printed Mongolian document recognition system

    Science.gov (United States)

    Peng, Liangrui; Liu, Changsong; Ding, Xiaoqing; Wang, Hua; Jin, Jianming

    2009-01-01

    Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of projection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.

  6. Greenland 5 km DEM, Ice Thickness, and Bedrock Elevation Grids

    Data.gov (United States)

    National Aeronautics and Space Administration — A Digital Elevation Model (DEM), ice thickness grid, and bedrock elevation grid of Greenland acquired as part of the PARCA program are available in ASCII text format...

  7. Document management in engineering construction

    International Nuclear Information System (INIS)

    Liao Bing

    2008-01-01

    Document management is one important part of systematic quality management, which is one of the key factors to ensure the construction quality. In the engineering construction, quality management and document management shall interwork all the time, to ensure the construction quality. Quality management ensures that the document is correctly generated and adopted, and thus the completeness, accuracy and systematicness of the document satisfy the filing requirements. Document management ensures that the document is correctly transferred during the construction, and various testimonies such as files and records are kept for the engineering construction and its quality management. This paper addresses the document management in the engineering construction based on the interwork of the quality management and document management. (author)

  8. Review of Collection of Documents “Krasnoyarsk Region during the Great Patriotic War. 1941-1945 (On the documents by Archive Agency of Krasnoyarsk Region, 2010. 497 p.”

    Directory of Open Access Journals (Sweden)

    Dmitrii A. Malyutin

    2013-09-01

    Full Text Available The paper presents the review of the collection of documents, including the data on social and economic situation in Krasnoyarsk Region during the Great Patriotic War, the activity of party and Soviet authorities, deeds by Krasnoyarsk natives in the frontline and labor achievements in the rear. The collection contains the documents, describing the daily life in wartime, the public mood, the living conditions, social security, the status of disabled veterans. The presented data, concerning patriotic activity of the orthodox church, camps of People's Commissariat for Internal Affairs, facts of desertion, speculation, criminality in the region prove weight and objective approach of the composite author to the documents selection.

  9. Document Level Assessment of Document Retrieval Systems in a Pairwise System Evaluation

    Science.gov (United States)

    Rajagopal, Prabha; Ravana, Sri Devi

    2017-01-01

    Introduction: The use of averaged topic-level scores can result in the loss of valuable data and can cause misinterpretation of the effectiveness of system performance. This study aims to use the scores of each document to evaluate document retrieval systems in a pairwise system evaluation. Method: The chosen evaluation metrics are document-level…

  10. Indexing it all the subject in the age of documentation, information, and data

    CERN Document Server

    Day, Ronald E

    2014-01-01

    In this book, Ronald Day offers a critical history of the modern tradition of documentation. Focusing on the documentary index (understood as a mode of social positioning), and drawing on the work of the French documentalist Suzanne Briet, Day explores the understanding and uses of indexicality. He examines the transition as indexes went from being explicit professional structures that mediated users and documents to being implicit infrastructural devices used in everyday information and communication acts. Doing so, he also traces three epistemic eras in the representation of individuals and groups, first in the forms of documents, then information, then data. Day investigates five cases from the modern tradition of documentation. He considers the socio-technical instrumentalism of Paul Otlet, "the father of European documentation" (contrasting it to the hermeneutic perspective of Martin Heidegger); the shift from documentation to information science and the accompanying transformation of persons and texts i...

  11. Shoulder dystocia documentation: an evaluation of a documentation training intervention.

    Science.gov (United States)

    LeRiche, Tammy; Oppenheimer, Lawrence; Caughey, Sharon; Fell, Deshayne; Walker, Mark

    2015-03-01

    To evaluate the quality and content of nurse and physician shoulder dystocia delivery documentation before and after MORE training in shoulder dystocia management skills and documentation. Approximately 384 charts at the Ottawa Hospital General Campus involving a diagnosis of shoulder dystocia between the years of 2000 and 2006 excluding the training year of 2003 were identified. The charts were evaluated for 14 key components derived from a validated instrument. The delivery notes were then scored based on these components by 2 separate investigators who were blinded to delivery note author, date, and patient identification to further quantify delivery record quality. Approximately 346 charts were reviewed for physician and nurse delivery documentation. The average score for physician notes was 6 (maximum possible score of 14) both before and after the training intervention. The nurses' average score was 5 before and after the training intervention. Negligible improvement was observed in the content and quality of shoulder dystocia documentation before and after nurse and physician training.

  12. Arxius documentals en publicitat : Centro Documental para la Conservación del Patrimonio Publicitario Español (Publidocnet

    Directory of Open Access Journals (Sweden)

    Marcos Recio, Juan Carlos

    2015-06-01

    Full Text Available Un model documental per a la publicitat ha de promoure i conservar el patrimoni amb totes les eines possibles que té a l'abast. Publidocnet és un centre de documentació actiu que ofereix als estudiants, investigadors i persones interessades en la publicitat una visió analítica de les campanyes, informació relacionada, poder veure'n l'anunci televisiu, escoltar-ne les falques, veure'n les imatges i estudiar-ne la fitxa tècnica; en definitiva, una manera d'entendre i conèixer la publicitat per mitjà de les creacions dels publicitaris. Té com a objectiu la conservació del patrimoni, en un acord tàcit amb les agències de publicitat, que donen el material d'estudi per als alumnes. Publidocnet és, doncs, un centre documental multimèdia, gràfic i textual de la publicitat espanyola.Un modelo documental para la publicidad ha de promover y conservar el patrimonio con todas las herramientas posibles a su alcance. Publidocnet es un centro de documentación activo que ofrece a los estudiantes, investigadores y personas interesadas en la publicidad una visión analítica de las campañas, información sobre ellas, poder ver el anuncio televisivo, escuchar las cuñas, ver las imágenes de las campañas y estudiar la ficha técnica; en definitiva, una manera de entender y conocer la publicidad a través de las creaciones de los publicitarios. Su fin último es la conservación del patrimonio, en un acuerdo tácito con las agencias de publicidad que donan el material para el estudio por parte de los alumnos y para su conservación. Publidocnet es, pues, un centro documental multimedia, gráfico y textual de la publicidad española.A documentary model for advertising has to promote and conserve advertising heritage with all the instruments at its disposal. Publidocnet, Centro Documental para la Conservación del Patrimonio Publicitario Español, is a centre for conserving documentary heritage in advertising in which students, researchers and

  13. SGML-Based Markup for Literary Texts: Two Problems and Some Solutions.

    Science.gov (United States)

    Barnard, David; And Others

    1988-01-01

    Identifies the Standard Generalized Markup Language (SGML) as the best basis for a markup standard for encoding literary texts. Outlines solutions to problems using SGML and discusses the problem of maintaining multiple views of a document. Examines several ways of reducing the burden of markups. (GEA)

  14. Connecting Knowledge for Text Construction through the Use of Graphic Organizers

    OpenAIRE

    Reyes, Elsy Camila

    2011-01-01

    This study analyzed how basic level students comprehend short descriptive texts and rewrite their texts through the use of graphic organizers (GOs). The research was built upon the qualitative research paradigm with the inclusion of descriptive and introspective approaches. The study was carried out at a prestigious private school in Bogotá, Colombia, with basic English level II sixth graders. Data was gathered through focus groups, GOs, and students' documents. The results of the study demon...

  15. Towards a Pattern Language Approach to Document Description

    Directory of Open Access Journals (Sweden)

    Robert Waller

    2012-07-01

    Full Text Available Pattern libraries, originating in architecture, are a common way to share design solutions in interaction design and software engineering. Our aim in this paper is to consider patterns as a way of describing commonly-occurring document design solutions to particular problems, from two points of view. First, we are interested in their use as exemplars for designers to follow, and second, we suggest them as a means of understanding linguistic and graphical data for their organization into corpora that will facilitate descriptive work. We discuss the use of patterns across a range of disciplines before suggesting the need to place patterns in the context of genres, with each potentially belonging to a “home genre” in which it originates and to which it makes an implicit intertextual reference intended to produce a particular reader response in the form of a reading strategy or interpretative stance. We consider some conceptual and technical issues involved in the descriptive study of patterns in naturally-occurring documents, including the challenges involved in building a document corpus.

  16. Text-Mining Applications for Creation of Biofilm Literature Database

    Directory of Open Access Journals (Sweden)

    Kanika Gupta

    2017-10-01

    So in the present research published corpora of 34306 documents for biofilm was collected from PubMed database along with non-indexed resources like books, conferences, newspaper articles, etc. and these were divided into five categories i.e. classification, growth and development, physiology, drug effects and radiation effects. These five categories were further individually divided into three parts i.e. Journal Title, Abstract Title, and Abstract Text to make indexing highly specific. Text-processing was done using the software Rapid Miner_v5.3, which tokenizes the entire text into words and provides the frequency of each word within the document. The obtained words were normalized using Remove Stop and Stem Word command of Rapid Miner_v5.3 which removes the stopping and stemming words. The obtained words were stored in MS-Excel 2007 and were sorted in decreasing order of frequency using Sort & Filter command of MS-Excel 2007. The words are visualization through networks obtained by Cytoscape_v2.7.0. Now the words obtained were highly specific for biofilms, generating a controlled biofilm vocabulary and this vocabulary could be used for indexing articles for biofilm (similar to MeSH database which indexes articles for PubMed. The obtained keywords information was stored in the relational database which is locally hosted using the WAMP_v2.4 (Windows, Apache, MySQL, PHP server. The available biofilm vocabulary will be significant for researchers studying biofilm literature, making their search easy and efficient.

  17. Tank Monitoring and Document control System (TMACS) As Built Software Design Document

    Energy Technology Data Exchange (ETDEWEB)

    GLASSCOCK, J.A.

    2000-01-27

    This document describes the software design for the Tank Monitor and Control System (TMACS). This document captures the existing as-built design of TMACS as of November 1999. It will be used as a reference document to the system maintainers who will be maintaining and modifying the TMACS functions as necessary. The heart of the TMACS system is the ''point-processing'' functionality where a sample value is received from the field sensors and the value is analyzed, logged, or alarmed as required. This Software Design Document focuses on the point-processing functions.

  18. Toward Documentation of Program Evolution

    DEFF Research Database (Denmark)

    Vestdam, Thomas; Nørmark, Kurt

    2005-01-01

    The documentation of a program often falls behind the evolution of the program source files. When this happens it may be attractive to shift the documentation mode from updating the documentation to documenting the evolution of the program. This paper describes tools that support the documentatio....... It is concluded that our approach can help revitalize older documentation, and that discovery of the fine grained program evolution steps help the programmer in documenting the evolution of the program....

  19. Quantity of documentation of maltreatment risk factors in injury-related paediatric hospitalisations

    Directory of Open Access Journals (Sweden)

    McKenzie Kirsten

    2012-07-01

    Full Text Available Abstract Background While child maltreatment is recognised as a global problem, solid epidemiological data on the prevalence of child maltreatment and risk factors associated with child maltreatment is lacking in Australia and internationally. There have been recent calls for action to improve the evidence-base capturing and describing child abuse, particularly those data captured within the health sector. This paper describes the quantity of documentation of maltreatment risk factors in injury-related paediatric hospitalisations in Queensland, Australia. Methods This study involved a retrospective medical record review, text extraction and coding methodology to assess the quantity of documentation of risk factors and the subsequent utility of data in hospital records for describing child maltreatment and data linkage to Child Protection Service (CPS. Results There were 433 children in the maltreatment group and 462 in the unintentional injury group for whom medical records could be reviewed. Almost 93% of the maltreatment code sample, but only 11% of the unintentional injury sample had documentation identified indicating the presence of any of 20 risk factors. In the maltreatment group the most commonly documented risk factor was history of abuse (41%. In those with an unintentional injury, the most commonly documented risk factor was alcohol abuse of the child or family (3%. More than 93% of the maltreatment sample also linked to a child protection record. Of concern are the 16% of those children who linked to child protection who did not have documented risk factors in the medical record. Conclusion Given the importance of the medical record as a source of information about children presenting to hospital for treatment and as a potential source of evidence for legal action the lack of documentation is of concern. The details surrounding the injury admission and consideration of any maltreatment related risk factors, both identifying their

  20. Web document engineering

    International Nuclear Information System (INIS)

    White, B.

    1996-05-01

    This tutorial provides an overview of several document engineering techniques which are applicable to the authoring of World Wide Web documents. It illustrates how pre-WWW hypertext research is applicable to the development of WWW information resources

  1. Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.

    Science.gov (United States)

    He, Bin; Dong, Bin; Guan, Yi; Yang, Jinfeng; Jiang, Zhipeng; Yu, Qiubin; Cheng, Jianyi; Qu, Chunyan

    2017-05-01

    To build a comprehensive corpus covering syntactic and semantic annotations of Chinese clinical texts with corresponding annotation guidelines and methods as well as to develop tools trained on the annotated corpus, which supplies baselines for research on Chinese texts in the clinical domain. An iterative annotation method was proposed to train annotators and to develop annotation guidelines. Then, by using annotation quality assurance measures, a comprehensive corpus was built, containing annotations of part-of-speech (POS) tags, syntactic tags, entities, assertions, and relations. Inter-annotator agreement (IAA) was calculated to evaluate the annotation quality and a Chinese clinical text processing and information extraction system (CCTPIES) was developed based on our annotated corpus. The syntactic corpus consists of 138 Chinese clinical documents with 47,426 tokens and 2612 full parsing trees, while the semantic corpus includes 992 documents that annotated 39,511 entities with their assertions and 7693 relations. IAA evaluation shows that this comprehensive corpus is of good quality, and the system modules are effective. The annotated corpus makes a considerable contribution to natural language processing (NLP) research into Chinese texts in the clinical domain. However, this corpus has a number of limitations. Some additional types of clinical text should be introduced to improve corpus coverage and active learning methods should be utilized to promote annotation efficiency. In this study, several annotation guidelines and an annotation method for Chinese clinical texts were proposed, and a comprehensive corpus with its NLP modules were constructed, providing a foundation for further study of applying NLP techniques to Chinese texts in the clinical domain. Copyright © 2017. Published by Elsevier Inc.

  2. The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements

    International Nuclear Information System (INIS)

    1975-01-01

    The texts of the Agreement between the International Atomic Energy Agency and the Republic of Austria that were in force on 30 September 1975 are reproduced in this document for the information of all Members of the Agency [ru

  3. The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements

    International Nuclear Information System (INIS)

    1975-01-01

    The texts of the Agreement between the International Atomic Energy Agency and the Republic of Austria that were in force on 30 September 1975 are reproduced in this document for the information of all Members of the Agency [fr

  4. The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements

    International Nuclear Information System (INIS)

    1975-01-01

    The texts of the Agreement between the International Atomic Energy Agency and the Republic of Austria that were in force on 30 September 1975 are reproduced in this document for the information of all Members of the Agency [es

  5. KNOWLEDGE AND VALORIZATION OF HISTORICAL SITES THROUGH 3D DOCUMENTATION AND MODELING

    Directory of Open Access Journals (Sweden)

    E. Farella

    2016-06-01

    Full Text Available The paper presents the first results of an interdisciplinary project related to the 3D documentation, dissemination, valorization and digital access of archeological sites. Beside the mere 3D documentation aim, the project has two goals: (i to easily explore and share via web references and results of the interdisciplinary work, including the interpretative process and the final reconstruction of the remains; (ii to promote and valorize archaeological areas using reality-based 3D data and Virtual Reality devices. This method has been verified on the ruins of the archeological site of Pausilypon, a maritime villa of Roman period (Naples, Italy. Using Unity3D, the virtual tour of the heritage site was integrated and enriched with the surveyed 3D data, text documents, CAAD reconstruction hypotheses, drawings, photos, etc. In this way, starting from the actual appearance of the ruins (panoramic images, passing through the 3D digital surveying models and several other historical information, the user is able to access virtual contents and reconstructed scenarios, all in a single virtual, interactive and immersive environment. These contents and scenarios allow to derive documentation and geometrical information, understand the site, perform analyses, see interpretative processes, communicate historical information and valorize the heritage location.

  6. NPP Grassland: Canas, Costa Rica, 1969-1970, R1

    Data.gov (United States)

    National Aeronautics and Space Administration — This data set contains two ASCII text files; one providing above-ground biomass, productivity, and bioelement concentration data for a derived savanna at Cañas (10.4...

  7. Publications | Page 278 | IDRC - International Development ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Multi-level participation for building adaptive capacity : formal ... to adaptive capacity helps formulate a theory of participation based on resilience thinking. ... kinds of ASCII text files for further analysis, with several pre-processing options.

  8. Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH

    Energy Technology Data Exchange (ETDEWEB)

    Bogen, Paul Logasa [ORNL; Symons, Christopher T [ORNL; McKenzie, Amber T [ORNL; Patton, Robert M [ORNL; Gillen, Rob [ORNL

    2013-01-01

    In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing sets of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.

  9. SALES DOCUMENTS IN PURCHASE AND SALE TRANSACTIONS OF STEAM COAL IN POLAND

    Directory of Open Access Journals (Sweden)

    Anna Galik

    2015-06-01

    Full Text Available This article describes sales documents in purchase and sale transactions of steam coal in Poland. In relation to introducing the excise tax on steam coal at the beginning in 2012, additional requirements appeared in documents during the sale of goods. Now the seller is obliged to issue various documents depending on the type of the buyer and the destination of goods. The article presents the coal sales documents for households, companies with no tax payment and companies with tax payment. The purpose of this article is to present complicated and time-consuming procedures during the sale of goods, as a result of the current excise tax on steam coal. In conclusion the author identify new solutions that are beneficial for the seller and the buyer.

  10. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents

    Directory of Open Access Journals (Sweden)

    Ferrández Oscar

    2012-07-01

    Full Text Available Abstract Background The increased use and adoption of Electronic Health Records (EHR causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI, which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act “Safe Harbor” method. This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. Methods We installed and evaluated five text de-identification systems “out-of-the-box” using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique ‘PHI’ category. Performance of the systems was assessed using recall (equivalent to sensitivity and precision (equivalent to positive predictive value metrics, as well as the F2-measure. Results Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest “out-of-the-box” F2-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F2-measure to 79% with partial matches

  11. Documents preparation and review

    International Nuclear Information System (INIS)

    1999-01-01

    Ignalina Safety Analysis Group takes active role in assisting regulatory body VATESI to prepare various regulatory documents and reviewing safety reports and other documentation presented by Ignalina NPP in the process of licensing of unit 1. The list of main documents prepared and reviewed is presented

  12. News from the Library: Access to CERN Council documents

    CERN Document Server

    CERN Library

    2011-01-01

    Records of the CERN Council and its Committees are now more easily available thanks to a digitisation and cataloguing project carried out by the CERN Archive team following the CERN Council's decision in September 2008 to have all paper copies of their past documents scanned and made available electronically. Over 12,000 official documents, most of them available in both English and French, are now available here.   Optical character recognition means that the full texts, as well as the cataloguing information (metadata), are searchable - just place the prefix "fulltext:" before your search term in CDS, e.g. "fulltext:Austria". Searching the metadata for "Austria" would find just four records, while the fulltext search finds 3,320. Various combinations of metadata and fulltext searching are possible to make your search as precise as you wish; for more details see the CDS Search Guide. First meeting of the CERN Council in 1955. Documents fro...

  13. Software simulator for property investigation of document management system with RFID tags

    Directory of Open Access Journals (Sweden)

    Kiedrowicz Maciej

    2016-01-01

    Full Text Available The study outlines the method for examining the properties of the RFID-tagged document management system. The system is composed of computers, where the software for supporting processes of the RFID-tagged doc-uments was installed. Furthermore, the system cooperates with many other elements of the secret office (cabinets, sluices, photocopiers, desks. The examination of the properties of the RFID-tagged document management system is, in this case, complex due to the number of a possible examination scenarios. The simulation method for examining the system properties was proposed. It allows to conduct the examination of the properties in a short period of time for numerous testing scenarios.

  14. Documentation: Records and Reports.

    Science.gov (United States)

    Akers, Michael J

    2017-01-01

    This article deals with documentation to include the beginning of documentation, the requirements of Good Manufacturing Practice reports and records, and the steps that can be taken to minimize Good Manufacturing Practice documentation problems. It is important to remember that documentation for 503a compounding involves the Formulation Record, Compounding Record, Standard Operating Procedures, Safety Data Sheets, etc. For 503b outsourcing facilities, compliance with Current Good Manufacturing Practices is required, so this article is applicable to them. For 503a pharmacies, one can see the development and modification of Good Manufacturing Practice and even observe changes as they are occurring in 503a documentation requirements and anticipate that changes will probably continue to occur. Copyright© by International Journal of Pharmaceutical Compounding, Inc.

  15. ARCHAEOLOGICAL DOCUMENTATION OF A DEFUNCT IRAQI TOWN

    Directory of Open Access Journals (Sweden)

    J. Šedina

    2016-06-01

    Full Text Available The subject of this article is the possibilities of the documentation of a defunct town from the Pre-Islamic period to Early Islamic period. This town is located near the town Makhmur in Iraq. The Czech archaeological mission has worked at this dig site. This Cultural Heritage site is threatened by war because in the vicinity are positions of ISIS. For security reasons, the applicability of Pleiades satellite data has been tested. Moreover, this area is a no-fly zone. However, the DTM created from stereo-images was insufficient for the desired application in archeology. The subject of this paper is the testing of the usability of RPAS technology and terrestrial photogrammetry for documentation of the remains of buildings. RPAS is a very fast growing technology that combines the advantages of aerial photogrammetry and terrestrial photogrammetry. A probably defunct church is a sample object.

  16. Instructions for submittal and control of FFTF design documents and design related documentation

    International Nuclear Information System (INIS)

    Grush, R.E.

    1976-10-01

    This document provides the system and requirements for management of FFTF technical data prepared by Westinghouse Hanford (HEDL), and design contractors, the construction contractor and lower tier equipment suppliers. Included in this document are provisions for the review, approval, release, change control, and accounting of FFTF design disclosure and base documentation. Also included are provisions for submittal of other design related documents for review and approval consistent with applicable requirements of RDT-Standard F 2-2, ''Quality Assurance Program Requirements.''

  17. Clinical map document based on XML (cMDX): document architecture with mapping feature for reporting and analysing prostate cancer in radical prostatectomy specimens.

    Science.gov (United States)

    Eminaga, Okyaz; Hinkelammert, Reemt; Semjonow, Axel; Neumann, Joerg; Abbas, Mahmoud; Koepke, Thomas; Bettendorf, Olaf; Eltze, Elke; Dugas, Martin

    2010-11-15

    The pathology report of radical prostatectomy specimens plays an important role in clinical decisions and the prognostic evaluation in Prostate Cancer (PCa). The anatomical schema is a helpful tool to document PCa extension for clinical and research purposes. To achieve electronic documentation and analysis, an appropriate documentation model for anatomical schemas is needed. For this purpose we developed cMDX. The document architecture of cMDX was designed according to Open Packaging Conventions by separating the whole data into template data and patient data. Analogue custom XML elements were considered to harmonize the graphical representation (e.g. tumour extension) with the textual data (e.g. histological patterns). The graphical documentation was based on the four-layer visualization model that forms the interaction between different custom XML elements. Sensible personal data were encrypted with a 256-bit cryptographic algorithm to avoid misuse. In order to assess the clinical value, we retrospectively analysed the tumour extension in 255 patients after radical prostatectomy. The pathology report with cMDX can represent pathological findings of the prostate in schematic styles. Such reports can be integrated into the hospital information system. "cMDX" documents can be converted into different data formats like text, graphics and PDF. Supplementary tools like cMDX Editor and an analyser tool were implemented. The graphical analysis of 255 prostatectomy specimens showed that PCa were mostly localized in the peripheral zone (Mean: 73% ± 25). 54% of PCa showed a multifocal growth pattern. cMDX can be used for routine histopathological reporting of radical prostatectomy specimens and provide data for scientific analysis.

  18. Extended Subject Access to Hypertext Online Documentation. Part III: The Document-Boundaries Problem.

    Science.gov (United States)

    Girill, T. R.

    1991-01-01

    This article continues the description of DFT (Document, Find, Theseus), an online documentation system that provides computer-managed on-demand printing of software manuals as well as the interactive retrieval of reference passages. Document boundaries in the hypertext database are discussed, search vocabulary complexities are described, and text…

  19. CORPORATE DOCUMENTS AND FILES: FROM THE DAY-BY-DAY ORGANIZATION TO THE EFFECTIVE MANAGEMENT

    Directory of Open Access Journals (Sweden)

    Elisabeth Adriana Dudziak

    2010-09-01

    Full Text Available The aim of this paper is to present and discuss the process of planning, organization and management of business archives and documents from the standpoint of the professional of executive secretary. It begins from the introduction to current scenario with focus on global dynamics, the service demands and the essential role of executive secretary in the company. It presents a theoretical review of key concepts and standards internationally adopted. The sets out the need of carry more than the simple organization of archives, management of documents and archives. Based on this premise, we present conceptual bases for implantation of a Service of Documentation and Archive, main document types in companies, and criteria of classification and document custody temporality. We discuss about the importance of strategic planning, tactical and operational management of archives. Finally, we present a description of the steps of the implantation of an Archive and Documentation Service, and the code of ethics of document manager.

  20. The Council of Europe and Sport, 1966-1998. Volume III: Texts of the Anti-Doping Convention.

    Science.gov (United States)

    Council of Europe, Strasbourg (France).

    This document presents texts in the field of sports and doping that were adopted by various committees of the Council of Europe. The seven sections present: (1) "Texts Adopted by the Committee of Ministers, 1996-1988"; (2) "Texts Adopted at the Conferences of European Ministers Responsible for Sport Since 1978" and…

  1. Automatic text summarization

    CERN Document Server

    Torres Moreno, Juan Manuel

    2014-01-01

    This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts.  The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been develop

  2. Experiments on Supervised Learning Algorithms for Text Categorization

    Science.gov (United States)

    Namburu, Setu Madhavi; Tu, Haiying; Luo, Jianhui; Pattipati, Krishna R.

    2005-01-01

    Modern information society is facing the challenge of handling massive volume of online documents, news, intelligence reports, and so on. How to use the information accurately and in a timely manner becomes a major concern in many areas. While the general information may also include images and voice, we focus on the categorization of text data in this paper. We provide a brief overview of the information processing flow for text categorization, and discuss two supervised learning algorithms, viz., support vector machines (SVM) and partial least squares (PLS), which have been successfully applied in other domains, e.g., fault diagnosis [9]. While SVM has been well explored for binary classification and was reported as an efficient algorithm for text categorization, PLS has not yet been applied to text categorization. Our experiments are conducted on three data sets: Reuter's- 21578 dataset about corporate mergers and data acquisitions (ACQ), WebKB and the 20-Newsgroups. Results show that the performance of PLS is comparable to SVM in text categorization. A major drawback of SVM for multi-class categorization is that it requires a voting scheme based on the results of pair-wise classification. PLS does not have this drawback and could be a better candidate for multi-class text categorization.

  3. Texts of the Agency's Agreements with the Republic of Austria

    International Nuclear Information System (INIS)

    1999-01-01

    The document reproduces the text of the Exchange of Letters, dated 8 January 1999 and 27 January 1999 respectively, between the Ministry of Foreign Affairs of Austria and the IAEA, constituting a supplementary agreement t o the Agreement between the Republic of Austria and the IAEA regarding the Headquarters of the IAEA. The aforementioned Agreement entered into force on 8 February 1999

  4. The Texts of the Agency's Agreements with the United Nations

    International Nuclear Information System (INIS)

    1963-01-01

    The text of the Special Agreement extending the jurisdiction of the Administrative Tribunal of the United Nations International Atomic Energy Agency regarding the applications of officials of this organization alleging non-observance of the Regulations of the Pension Fund UN staff, comes into force October 18, 1963, is reproduced in this document for the information of all Members of the Agency [fr

  5. The Texts of the Agency's Agreements with the United Nations

    International Nuclear Information System (INIS)

    1963-01-01

    The text of the Special Agreement extending the jurisdiction of the Administrative Tribunal of the United Nations International Atomic Energy Agency regarding the applications of officials of this organization alleging non-observance of the Regulations of the Pension Fund UN staff, comes into force October 18, 1963, is reproduced in this document for the information of all Members of the Agency

  6. The Texts of the Agency's Agreements with the United Nations

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1963-12-02

    The text of the Special Agreement extending the jurisdiction of the Administrative Tribunal of the United Nations International Atomic Energy Agency regarding the applications of officials of this organization alleging non-observance of the Regulations of the Pension Fund UN staff, comes into force October 18, 1963, is reproduced in this document for the information of all Members of the Agency.

  7. Synthesis document on the long life behavior of packages: reference operational document ''CSD-C'' 2004

    International Nuclear Information System (INIS)

    Helie, M.

    2004-12-01

    This document is realized in the framework of the law of 1991 on the radioactive wastes management. The 2004 synthesis document on long time behavior of standard packages of compacted wastes is constituted by two documents, the reference document and the operational document. This paper presents the operational model describing the packages alteration by the water and the associated radionuclide release. (A.L.B.)

  8. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted.   CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat a...

  9. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the natu...

  10. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the natur...

  11. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ Management- CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. Management - CB - MB - FB Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2007 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the nature of em¬pl...

  12. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ Management- CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. Management - CB - MB - FB Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2007 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the nature of employment and ...

  13. CMS DOCUMENTATION

    CERN Multimedia

    CMS TALKS AT MAJOR MEETINGS The agenda and talks from major CMS meetings can now be electronically accessed from the iCMS Web site. The following items can be found on: http://cms.cern.ch/iCMS/ General - CMS Weeks (Collaboration Meetings), CMS Weeks Agendas The talks presented at the Plenary Sessions. LHC Symposiums Management - CB - MB - FB - FMC Agendas and minutes are accessible to CMS members through their AFS account (ZH). However some linked documents are restricted to the Board Members. FB documents are only accessible to FB members. LHCC The talks presented at the ‘CMS Meetings with LHCC Referees’ are available on request from the PM or MB Country Representative. Annual Reviews The talks presented at the 2006 Annual reviews are posted. CMS DOCUMENTS It is considered useful to establish information on the first employment of CMS doctoral students upon completion of their theses. Therefore it is requested that Ph.D students inform the CMS Secretariat about the na...

  14. 76 FR 66311 - Draft Documents To Support Submission of an Electronic Common Technical Document; Availability

    Science.gov (United States)

    2011-10-26

    ...] Draft Documents To Support Submission of an Electronic Common Technical Document; Availability AGENCY... making regulatory submissions in electronic format using the electronic Common Technical Document (eCTD....S. regional document type definition, version 3.0) and ``Comprehensive Table of Contents Headings...

  15. NPP Grassland: Beacon Hill, U.K., 1972-1993, R1

    Data.gov (United States)

    National Aeronautics and Space Administration — This data set contains two ASCII text files, one providing productivity measurements for a chalk grassland on Beacon Hill, West Sussex, U.K. (50.92 N, -0.85 W) and...

  16. ON CURRICULAR PROPOSALS OF THE PORTUGUESE LANGUAGE: A DOCUMENT ANALYSIS IN JUIZ DE FORA (MG

    Directory of Open Access Journals (Sweden)

    Tânia Guedes MAGALHÃES

    2014-12-01

    Full Text Available This paper, whose objective is to analyze two curricular proposals of Portuguese from the Juiz de Fora City Hall (2001 and 2012, is an extract from a research entitled “On text genres and teaching: a collaborative research with teachers of Portuguese” (2011/2013. Text genres have been suggested by curricular proposals as a central object for teachers who work with Portuguese language teaching; for this, it is relevant to analyze the documents in the realm of the ongoing research. As theoretical references, we used authors who propose a didactic model based on the development of language skills and linguistic reasoning (MENDONÇA, 2006 which in turn are based on an interactional conception of language (BRONCKART, 1999; SCHNEUWLY; DOLZ, 2004. Document analysis was used as methodology, which envisions assessment of pieces of information in documents as well as their outcomes. The data show that the 2012 curricular proposal is more adequate to Portuguese language teaching than the first one, mainly for its theoretical and methodological grounding, which emphasize the development of students’ linguistic and discursive skills. Guided by an interactionist notion – unlike the norm-centered 2001 proposal – the 2012 document fosters the development of linguistic reasoning and usage skills.

  17. Correcting geometric and photometric distortion of document images on a smartphone

    Science.gov (United States)

    Simon, Christian; Williem; Park, In Kyu

    2015-01-01

    A set of document image processing algorithms for improving the optical character recognition (OCR) capability of smartphone applications is presented. The scope of the problem covers the geometric and photometric distortion correction of document images. The proposed framework was developed to satisfy industrial requirements. It is implemented on an off-the-shelf smartphone with limited resources in terms of speed and memory. Geometric distortions, i.e., skew and perspective distortion, are corrected by sending horizontal and vertical vanishing points toward infinity in a downsampled image. Photometric distortion includes image degradation from moiré pattern noise and specular highlights. Moiré pattern noise is removed using low-pass filters with different sizes independently applied to the background and text region. The contrast of the text in a specular highlighted area is enhanced by locally enlarging the intensity difference between the background and text while the noise is suppressed. Intensive experiments indicate that the proposed methods show a consistent and robust performance on a smartphone with a runtime of less than 1 s.

  18. Text of the joint U.S.-Soviet summit statement

    International Nuclear Information System (INIS)

    1987-12-01

    The document reproduces the text of the joint U.S.-Soviet summit statement issued on 10 December 1987 at the conclusion of the meeting between the President of the United States and the General Secretary of the Central Committee of the Communist Party of the Soviet Union (Washington, December 7-10, 1987). It refers to the arms control (including nuclear weapons), human rights and humanitarian concerns, regional issues, bilateral affairs and further meetings

  19. The Texts of the Agency's Headquarters Agreement with Austria and Related Agreements

    International Nuclear Information System (INIS)

    1975-01-01

    The texts of six agreements concluded between the Agency and the Republic of Austria as a result of the location of the Agency's headquarters in Austria, which were in force on 31 October 1975, are reproduced in this document for the information of all Members

  20. Formalization of Technological Knowledge in the Field of Metallurgy using Document Classification Tools Supported with Semantic Techniques

    Directory of Open Access Journals (Sweden)

    Regulski K.

    2017-06-01

    Full Text Available The process of knowledge formalization is an essential part of decision support systems development. Creating a technological knowledge base in the field of metallurgy encountered problems in acquisition and codifying reusable computer artifacts based on text documents. The aim of the work was to adapt the algorithms for classification of documents and to develop a method of semantic integration of a created repository. Author used artificial intelligence tools: latent semantic indexing, rough sets, association rules learning and ontologies as a tool for integration. The developed methodology allowed for the creation of semantic knowledge base on the basis of documents in natural language in the field of metallurgy.

  1. Embedding the shapes of regions of interest into a Clinical Document Architecture document.

    Science.gov (United States)

    Minh, Nguyen Hai; Yi, Byoung-Kee; Kim, Il Kon; Song, Joon Hyun; Binh, Pham Viet

    2015-03-01

    Sharing a medical image visually annotated by a region of interest with a remotely located specialist for consultation is a good practice. It may, however, require a special-purpose (and most likely expensive) system to send and view them, which is an unfeasible solution in developing countries such as Vietnam. In this study, we design and implement interoperable methods based on the HL7 Clinical Document Architecture and the eXtensible Markup Language Stylesheet Language for Transformation standards to seamlessly exchange and visually present the shapes of regions of interest using web browsers. We also propose a new integration architecture for a Clinical Document Architecture generator that enables embedding of regions of interest and simultaneous auto-generation of corresponding style sheets. Using the Clinical Document Architecture document and style sheet, a sender can transmit clinical documents and medical images together with coordinate values of regions of interest to recipients. Recipients can easily view the documents and display embedded regions of interest by rendering them in their web browser of choice. © The Author(s) 2014.

  2. Large Scale Document Inversion using a Multi-threaded Computing System.

    Science.gov (United States)

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2017-06-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  3. Clinical audit on documentation of anticipatory "Not for Resuscitation" orders in a tertiary australian teaching hospital

    Directory of Open Access Journals (Sweden)

    Naveen Sulakshan Salins

    2011-01-01

    Full Text Available Aim: The purpose of this clinical audit was to determine how accurately documentation of anticipatory Not for Resuscitation (NFR orders takes place in a major metropolitan teaching hospital of Australia. Materials and Methods: Retrospective hospital-based study. Independent case reviewers using a questionnaire designed to study NFR documentation reviewed documentation of NFR in 88 case records. Results: Prognosis was documented in only 40% of cases and palliative care was offered to two-third of patients with documented NFR. There was no documentation of the cardiopulmonary resuscitation (CPR process or outcomes of CPR in most of the cases. Only in less than 50% of cases studied there was documented evidence to suggest that the reason for NFR documentation was consistent with patient′s choices. Conclusion: Good discussion, unambiguous documentation and clinical supervision of NFR order ensure dignified and quality care to the dying.

  4. Human Document Project

    NARCIS (Netherlands)

    de Vries, Jeroen; Abelmann, Leon; Manz, A; Elwenspoek, Michael Curt

    2012-01-01

    “The Human Document Project‿ is a project which tries to answer all of the questions related to preserving information about the human race for tens of generations of humans to come or maybe even for a future intelligence which can emerge in the coming thousands of years. This document mainly

  5. THE MANAGEMENT OF DOCUMENTS AN OPTIMISING COMPONENT FOR A COMPANIES IT SYSTEM

    Directory of Open Access Journals (Sweden)

    Vaduva Florin

    2008-05-01

    Full Text Available In order to ensure success in the competitive world of business, companies must accommodate the needs of their clients, partners, employees and capital owner. Companies that pay attention to the way their documents and information are administrated are more prepared to face cost reduction and can respond much faster to the changes occurred on Basically, it is all about information and controlling it and thus the response time is minimum to any inquiries or demands that come from inside the informational system of the company. So, you need an efficient document management. Software solutions that come to your aid, in order to optimize this process are Electronic Document Management System.

  6. [Online text-based psychosocial intervention for Youth in Quebec].

    Science.gov (United States)

    Thoër, Christine; Noiseux, Kathia; Siche, Fabienne; Palardy, Caroline; Vanier, Claire; Vrignaud, Caroline

    In 2013, Tel-jeunes created a text messaging intervention program to reach youth aged 12 to 17 years on their cell phones. Tel-jeunes was the first in the country to offer a text-based brief psychosocial interventions performed by professional counselors. Researchers were contacted to document and evaluate the program. The research aimed to: 1) determine motives, contexts and issues that lead young people to use the SMS service; 2) document the characteristics of text-based brief intervention; and 3) assess the advantages and difficulties encountered by counselors who respounded to youth text-messages. We conducted a multimethod research from November 2013 to May 2014. We held four focus groups with 23 adolescents aged 15 to 17 who had or not used the SMS service, conducted a content analysis of a corpus of 13,236 text messages (or 601 conversations), and two focus groups with 11 Tel-jeunes counselors, just over a year after the implantation of the service. Our findings show that the SMS service meets youth needs. They identify text messaging to be their prefered mode of communication with Tel-jeunes when they need support or information. Moreover, the service reaches young people who would not have felt confortable to contact Tel-jeunes by phone. We identified three dominant issues in youths demands: romantic relationships, psychological health and sexuality. Perceived benefits of the service include anonimity and privacy (cell phone providing the ability to text anywhere). Youth participants also appreciated writing to counselors as they felt they had more time to think abouth their questions and answers to the counselor. Counselors were more ambivalent. They considered text-based intervention to be very effective and satisfactory to adress youth information requests, but reported difficulties when dealing with more complex problems or with mental health issues. They reported that text-based communication makes it more difficult to assess youth emotional states

  7. LCS Content Document Application

    Science.gov (United States)

    Hochstadt, Jake

    2011-01-01

    My project at KSC during my spring 2011 internship was to develop a Ruby on Rails application to manage Content Documents..A Content Document is a collection of documents and information that describes what software is installed on a Launch Control System Computer. It's important for us to make sure the tools we use everyday are secure, up-to-date, and properly licensed. Previously, keeping track of the information was done by Excel and Word files between different personnel. The goal of the new application is to be able to manage and access the Content Documents through a single database backed web application. Our LCS team will benefit greatly with this app. Admin's will be able to login securely to keep track and update the software installed on each computer in a timely manner. We also included exportability such as attaching additional documents that can be downloaded from the web application. The finished application will ease the process of managing Content Documents while streamlining the procedure. Ruby on Rails is a very powerful programming language and I am grateful to have the opportunity to build this application.

  8. On Intertext in Chemotherapy: An Ethnography of Text in Medical Practice

    DEFF Research Database (Denmark)

    Christensen, Lars Rune

    2016-01-01

    Building on literary theory and data from a field study of text in chemotherapy, this article introduces the concept of intertext and the associated concepts of corpus and intertextuality to CSCW. It shows that the ensemble of documents used and produced in practice can be said to form a corpus......, including the complementary type, the intratextual type and the mediated type. In this manner the article aims to systematically conceptualise cooperative actors’ engagement with text in text-laden practices. The approach is arguably novel and beneficial to CSCW. The article also contributes...... with a discussion of computer enabling the activity of creating intertext. This is a key concern for cooperative work as intertext is central to text-centric work practices such as healthcare....

  9. GPM Mission Gridded Text Products Providing Surface Precipitation Retrievals

    Science.gov (United States)

    Stocker, Erich Franz; Kelley, Owen; Huffman, George; Kummerow, Christian

    2015-04-01

    constellation satellites. Both of these gridded products are generated for a .25 degree x .25 degree hourly grid, which are packaged into daily ASCII files that can downloaded from the PPS FTP site. To reduce the download size, the files are compressed using the gzip utility. This paper will focus on presenting high-level details about the gridded text product being generated from the instruments on the GPM core satellite. But summary information will also be presented about the partner radiometer gridded product. All retrievals for the partner radiometer are done using the GPROF2014 algorithm using as input the PPS generated inter-calibrated 1C product for the radiometer.

  10. Feasibility Study of Document Delivery Services in Special Libraries in Tehran

    Directory of Open Access Journals (Sweden)

    Assiyeh Pour- Emam- Ali

    2006-10-01

    Full Text Available The present study investigates the feasibility of establishing document delivery services in special libraries in Tehran. Document delivery services is copyright-cleared hard copy or electronic copy supply to individuals or corporations on a non-profit or for-profit basis. A descriptive survey was conducted over 105 special libraries located within Tehran. Capabilities studied included manual and automated equipments, skilled and motivated manpower, adequate budget and etc. Investigations show that 8.42% of these libraries use web-sites for resource location. 5.43% employ bibliographies. 5.36% of users lodge their requests by phone. 2.32% of the libraries receive requests in person. 3.14% of librarians are familiar with English while 6.28% are familiar with IT. 5.27% of the libraries studied use British library Document Supply Center at Boston Spa as their primary source of Foreign Document acquisition. 5.32% of the libraries consider membership in Interlibrary Cooperative Schemes as appropriate means of meeting patrons’ information needs. Maximum request response time is 3-4 weeks. 3.28% of the requests are for books. 6.88% of the special libraries, lack staff training courses for skill acquisition in the area of document delivery. 8.29% of libraries cite lack of adequate equipment as the main document delivery obstacle. The findings demonstrate the document delivery service among special libraries in Tehran is not appropriate given the existing capabilities.

  11. Foreign patent documentation and information research

    International Nuclear Information System (INIS)

    Wang Tongsheng; Wu Xianfeng; Liu Jia; Cao Jifen; Song Tianbao; Feng Beiyuan; Zhang Baozhu

    2014-01-01

    Patent documentations are important scientific and technical documentations, which gather legal information, technical information and economic information together. According to WIPO forecasts, making full use of patent documentation can save 40% of research funding and 60% of the study period. Foreign patent documentations are the world's most valuable patent documentations, and many original technologies that have significant influence are first disclosed in foreign patent documentation. Studying and making use of foreign patent documentations can improve our starting point of scientific and technological innovation, and reduce the research investment. This paper analyzes foreign patent documentation and, combining with the actual development of nuclear technology in our country, makes specific recommendations for patent documentation research. (authors)

  12. [Digitalization, archival storage and use of image documentation in the GastroBase-II system].

    Science.gov (United States)

    Kocna, P

    1997-05-14

    "GastroBase-II" is a module of the clinical information system "KIS-ComSyD"; The main part is represented by structured data-text with an expert system including on-line image digitalization in gastroenterology (incl. endoscopic, X-ray and endosonography pictures). The hardware and software of the GastroBase are described as well as six-years experiences with application of digitalized image data. An integration of a picture into text, reports, slides for a lecture or an electronic atlas is documented with examples. Briefly are reported out experiences with graphic editors (PhotoStyler), text editor (WordPerfect) and slide preparation for lecturing with the presentation software PowerPoint. The multimedia applications on the CD-ROM illustrate a modern trend using digitalized image documentation for pregradual and postgradual education.

  13. The Effect of Preprocessing on Arabic Document Categorization

    Directory of Open Access Journals (Sweden)

    Abdullah Ayedh

    2016-04-01

    Full Text Available Preprocessing is one of the main components in a conventional document categorization (DC framework. This paper aims to highlight the effect of preprocessing tasks on the efficiency of the Arabic DC system. In this study, three classification techniques are used, namely, naive Bayes (NB, k-nearest neighbor (KNN, and support vector machine (SVM. Experimental analysis on Arabic datasets reveals that preprocessing techniques have a significant impact on the classification accuracy, especially with complicated morphological structure of the Arabic language. Choosing appropriate combinations of preprocessing tasks provides significant improvement on the accuracy of document categorization depending on the feature size and classification techniques. Findings of this study show that the SVM technique has outperformed the KNN and NB techniques. The SVM technique achieved 96.74% micro-F1 value by using the combination of normalization and stemming as preprocessing tasks.

  14. Multiple sclerosis documentation system (MSDS): moving from documentation to management of MS patients.

    Science.gov (United States)

    Ziemssen, Tjalf; Kempcke, Raimar; Eulitz, Marco; Großmann, Lars; Suhrbier, Alexander; Thomas, Katja; Schultheiss, Thorsten

    2013-09-01

    The long disease duration of multiple sclerosis and the increasing therapeutic options require a individualized therapeutic approach which should be carefully documented over years of observation. To switch from MS documentation to an innovative MS management, new computer- and internet-based tools could be implemented as we could demonstrate with the novel computer-based patient management system "multiple sclerosis management system 3D" (MSDS 3D). MSDS 3D allows documentation and management of visit schedules and mandatory examinations via defined study modules by integration of data input from various sources (patients, attending physicians and MS nurses). It provides forms for the documentation of patient visits as well as clinical and diagnostic findings. Information can be collected via interactive touch screens. Specific modules allow the management of highly efficacious treatments as natalizumab or fingolimod. MSDS can be used to transfer the documented data to databases as, e.g. the registry of the German MS society or REGIMS. MSDS has already been implemented successfully in clinical practice and is currently being evaluated in a multicenter setting. High-quality management and documentation are crucial for improvements in clinical practice and research work.

  15. Internationally Educated Female Teachers who have Immigrated to Nova Scotia: A Research/Performance Text

    Directory of Open Access Journals (Sweden)

    Susan C. Walsh

    2007-09-01

    Full Text Available This research/performance text emerged from a study involving internationally educated female teachers who have immigrated to Atlantic Canada. The text features the words and artwork of the research participants as well as excerpts from newspapers, academic writing, and documents about immigration in Nova Scotia juxtaposed so as to foreground the complexity of the women's immigration and integration experiences. Introductory comments provide contextual information about the research project, the participants, and the evolution of, as well as rationale for, the text as performance piece.

  16. 75 FR 67777 - Copyright Office; Federal Copyright Protection of Sound Recordings Fixed Before February 15, 1972

    Science.gov (United States)

    2010-11-03

    ... (not an image); Microsoft Word; WordPerfect; Rich Text Format (RTF); or ASCII text file format (not a..., spoken, or other sounds, but not including the sounds accompanying a motion picture or other audiovisual... general, Federal law is better defined, both as to the rights and the exceptions, and more consistent than...

  17. Apropiación del video documental: Una experiencia en Zacatelco, Tlaxcala, México

    Directory of Open Access Journals (Sweden)

    Mónica del Sagrario Medina Cuevas

    2015-12-01

    Full Text Available El presente artículo aborda la apropiación de un medio audiovisual como el video documental por parte del Patronato responsable del proyecto de conservación y restauración del retablo mayor de la Parroquia de Santa Inés Zacatelco, Tlaxcala, México. El trabajo explica cómo a partir del proceso de realización de un documental surgen los relatos de los habitantes de Santa Inés. La experiencia representa una oportunidad para estudiar el uso del género documental ante las facilidades que brindan los nuevos soportes de producción audiovisual.

  18. LEVERAGING EXISTING HERITAGE DOCUMENTATION FOR ANIMATIONS: SENATE VIRTUAL TOUR

    Directory of Open Access Journals (Sweden)

    A. Dhanda

    2017-08-01

    Full Text Available The use of digital documentation techniques has led to an increase in opportunities for using documentation data for valorization purposes, in addition to technical purposes. Likewise, building information models (BIMs made from these data sets hold valuable information that can be as effective for public education as it is for rehabilitation. A BIM can reveal the elements of a building, as well as the different stages of a building over time. Valorizing this information increases the possibility for public engagement and interest in a heritage place. Digital data sets were leveraged by the Carleton Immersive Media Studio (CIMS for parts of a virtual tour of the Senate of Canada. For the tour, workflows involving four different programs were explored to determine an efficient and effective way to leverage the existing documentation data to create informative and visually enticing animations for public dissemination: Autodesk Revit, Enscape, Autodesk 3ds Max, and Bentley Pointools. The explored workflows involve animations of point clouds, BIMs, and a combination of the two.

  19. Domain-independent information extraction in unstructured text

    Energy Technology Data Exchange (ETDEWEB)

    Irwin, N.H. [Sandia National Labs., Albuquerque, NM (United States). Software Surety Dept.

    1996-09-01

    Extracting information from unstructured text has become an important research area in recent years due to the large amount of text now electronically available. This status report describes the findings and work done during the second year of a two-year Laboratory Directed Research and Development Project. Building on the first-year`s work of identifying important entities, this report details techniques used to group words into semantic categories and to output templates containing selective document content. Using word profiles and category clustering derived during a training run, the time-consuming knowledge-building task can be avoided. Though the output still lacks in completeness when compared to systems with domain-specific knowledge bases, the results do look promising. The two approaches are compatible and could complement each other within the same system. Domain-independent approaches retain appeal as a system that adapts and learns will soon outpace a system with any amount of a priori knowledge.

  20. Registration document 2005; Document de reference 2005

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2005-07-01

    This reference document of Gaz de France provides information and data on the Group activities in 2005: financial informations, business, activities, equipments factories and real estate, trade, capital, organization charts, employment, contracts and research programs. (A.L.B.)

  1. Documentation of Contraception and Pregnancy Intention In Medicaid Managed Care

    Directory of Open Access Journals (Sweden)

    Heike Thiel de Bocanegra

    2018-01-01

    Full Text Available Context: Clinical guidelines recommend the documentation of pregnancy intention and family planning needs during primary care visits. Prior to the 2014 Medicaid expansion and release of these guidelines, the documentation practices of Medicaid managed care providers are unknown. Methods: We performed a chart review of 1054 Medicaid managed care visits of women aged 13 to 49 to explore client, provider, and visit characteristics associated with documentation of immediate or future plans for having children and contraceptive method use. Five managed care plans used Current Procedural Terminology and International Classification of Diseases, Ninth Revision codes to identify providers with at least 15 women who had received family planning or well-woman care in 2013. We conducted multilevel logistic regression analyses with documentation of contraceptive method and pregnancy intention as outcome variables and clinic site as the level 2 random effect. Results: Only 12% of charts had documentation of pregnancy intention and 59% documented contraceptive use. Compared to women with a family planning visit reason, women with an annual, reproductive health, or primary care reason for their visit were significantly less likely to have contraception documented (odds ratio [OR] = 11.0; 95% confidence interval [CI] = 6.8-17.7. Age was also a significant predictor with women aged 30 to 49 (OR = 0.6; 95% CI = 0.4-0.9, and women aged 13 to 19 (OR = 0.2; 95% CI = 0.1-0.6 being less likely to have a note about pregnancy intention in their chart. Pregnancy intention was more likely to be documented in multispecialty clinics (OR = 15.5; 95% CI = 2.7-89.2. Conclusions: Interventions to improve routine medical record documentation of contraception and pregnancy intention regardless of patient age and visit characteristics are needed to facilitate the provision of family planning in managed care visits and, ultimately, achieving better maternal infant health outcomes

  2. Documentation design for probabilistic risk assessment

    International Nuclear Information System (INIS)

    Parkinson, W.J.; von Herrmann, J.L.

    1985-01-01

    This paper describes a framework for documentation design of probabilistic risk assessment (PRA) and is based on the EPRI document NP-3470 ''Documentation Design for Probabilistic Risk Assessment''. The goals for PRA documentation are stated. Four audiences are identified which PRA documentation must satisfy, and the documentation consistent with the needs of the various audiences are discussed, i.e., the Summary Report, the Executive Summary, the Main Report, and Appendices. The authors recommend the documentation specifications discussed herein as guides rather than rigid definitions

  3. A Text Mining Approach for Extracting Lessons Learned from Project Documentation: An Illustrative Case Study

    Directory of Open Access Journals (Sweden)

    Benjamin Matthies

    2017-12-01

    Full Text Available Lessons learned are important building blocks for continuous learning in project-based organisations. Nonetheless, the practical reality is that lessons learned are often not consistently reused for organisational learning. Two problems are commonly described in this context: the information overload and the lack of procedures and methods for the assessment and implementation of lessons learned. This paper addresses these problems, and appropriate solutions are combined in a systematic lesson learned process. Latent Dirichlet Allocation is presented to solve the first problem. Regarding the second problem, established risk management methods are adapted. The entire lessons learned process will be demonstrated in a practical case study

  4. The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations

    International Nuclear Information System (INIS)

    1961-01-01

    The texts of the Agency's agreements for co-operation with the regional inter-governmental organizations listed below, together with the respective protocols authenticating them, are reproduced in this document in the order in which the agreements entered into force, for the information of all Members of the Agency

  5. The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1961-02-07

    The texts of the Agency's agreements for co-operation with the regional inter-governmental organizations listed below, together with the respective protocols authenticating them, are reproduced in this document in the order in which the agreements entered into force, for the information of all Members of the Agency.

  6. The Texts of the Agency's Co-operation Agreements with Regional Intergovernmental Organizations

    International Nuclear Information System (INIS)

    1961-01-01

    The texts of the Agency's agreements for co-operation with the regional inter-governmental organizations listed below, together with the respective protocols authenticating them, are reproduced in this document in the order in which the agreements entered into force, for the information of all Members of the Agency [es

  7. Multimodal document management in radiotherapy

    International Nuclear Information System (INIS)

    Fahrner, H.; Kirrmann, S.; Roehner, F.; Schmucker, M.; Hall, M.; Heinemann, F.

    2013-01-01

    Background and purpose: After incorporating treatment planning and the organisational model of treatment planning in the operating schedule system (BAS, 'Betriebsablaufsystem'), complete document qualities were embedded in the digital environment. The aim of this project was to integrate all documents independent of their source (paper-bound or digital) and to make content from the BAS available in a structured manner. As many workflow steps as possible should be automated, e.g. assigning a document to a patient in the BAS. Additionally it must be guaranteed that at all times it could be traced who, when, how and from which source documents were imported into the departmental system. Furthermore work procedures should be changed that the documentation conducted either directly in the departmental system or from external systems can be incorporated digitally and paper document can be completely avoided (e.g. documents such as treatment certificate, treatment plans or documentation). It was a further aim, if possible, to automate the removal of paper documents from the departmental work flow, or even to make such paper documents superfluous. In this way patient letters for follow-up appointments should automatically generated from the BAS. Similarly patient record extracts in the form of PDF files should be enabled, e.g. for controlling purposes. Method: The available document qualities were analysed in detail by a multidisciplinary working group (BAS-AG) and after this examination and assessment of the possibility of modelling in our departmental workflow (BAS) they were transcribed into a flow diagram. The gathered specifications were implemented in a test environment by the clinical and administrative IT group of the department of radiation oncology and subsequent to a detailed analysis introduced into clinical routine. Results: The department has succeeded under the conditions of the aforementioned criteria to embed all relevant documents in the departmental

  8. Scheme Program Documentation Tools

    DEFF Research Database (Denmark)

    Nørmark, Kurt

    2004-01-01

    are separate and intended for different documentation purposes they are related to each other in several ways. Both tools are based on XML languages for tool setup and for documentation authoring. In addition, both tools rely on the LAML framework which---in a systematic way---makes an XML language available...... as named functions in Scheme. Finally, the Scheme Elucidator is able to integrate SchemeDoc resources as part of an internal documentation resource....

  9. The Texts of the Instruments relating to a Project for a Joint Agency-Norwegian Programme of Research with the Zero Power Reactor 'NORA'. Text of a Supplementary Agreement

    International Nuclear Information System (INIS)

    1962-01-01

    The text of a Supplementary Agreement to amend the Supply Agreement concerning the NORA project, which Supplementary Agreement was approved by the Board of Governors on 18 June 1962 and which entered into force on 3 September 1962, is reproduced in this document for the information of all Members of the Agency

  10. Agenda 21 for sustainable construction in developing countries: a discussion document

    CSIR Research Space (South Africa)

    International Council for Research and Innovation in Building and Construction, CIB

    2002-01-01

    Full Text Available 21 formulated at the Earth Summit in Rio and is published as a contribution to the Johannesburg World Summit on Sustainable Development. The aim of this document is to provide a research and development agenda and strategy for action for construction...

  11. Large Scale Document Inversion using a Multi-threaded Computing System

    Science.gov (United States)

    Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

    2018-01-01

    Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.

  12. Transfer Learning beyond Text Classification

    Science.gov (United States)

    Yang, Qiang

    Transfer learning is a new machine learning and data mining framework that allows the training and test data to come from different distributions or feature spaces. We can find many novel applications of machine learning and data mining where transfer learning is necessary. While much has been done in transfer learning in text classification and reinforcement learning, there has been a lack of documented success stories of novel applications of transfer learning in other areas. In this invited article, I will argue that transfer learning is in fact quite ubiquitous in many real world applications. In this article, I will illustrate this point through an overview of a broad spectrum of applications of transfer learning that range from collaborative filtering to sensor based location estimation and logical action model learning for AI planning. I will also discuss some potential future directions of transfer learning.

  13. Models and standards for production systems integration: Technological process and documents

    Directory of Open Access Journals (Sweden)

    Lečić Danica

    2005-01-01

    Full Text Available Electronic business demands from production companies to collaborate with customers, suppliers and end users and start electronic manufacturing. To achieve this goal companies have to integrate their subsystems (Application to Application-A2A and they have to collaborate with their business partners (Business to Business - B2B. For this purpose models and unique standards for integration are necessary. In this paper, ebXML and OAGI specifications have been used to present metamodel process by UML class diagram and standardized model of document Working Order for technological process in the form of OAGI BOD XML document. Based on it, from an example, model of technological process is presented by activity diagram (DA in XML form and an appearance of document Working Order. Just as well, rules of transformation DA to XML are presented.

  14. Tangible interactive system for document browsing and visualisation of multimedia data

    Science.gov (United States)

    Rytsar, Yuriy; Voloshynovskiy, Sviatoslav; Koval, Oleksiy; Deguillaume, Frederic; Topak, Emre; Startchik, Sergei; Pun, Thierry

    2006-01-01

    In this paper we introduce and develop a framework for document interactive navigation in multimodal databases. First, we analyze the main open issues of existing multimodal interfaces and then discuss two applications that include interaction with documents in several human environments, i.e., the so-called smart rooms. Second, we propose a system set-up dedicated to the efficient navigation in the printed documents. This set-up is based on the fusion of data from several modalities that include images and text. Both modalities can be used as cover data for hidden indexes using data-hiding technologies as well as source data for robust visual hashing. The particularities of the proposed robust visual hashing are described in the paper. Finally, we address two practical applications of smart rooms for tourism and education and demonstrate the advantages of the proposed solution.

  15. From university research to innovation Detecting knowledge transfer via text mining

    DEFF Research Database (Denmark)

    Woltmann, Sabrina; Clemmensen, Line Katrine Harder; Alkærsig, Lars

    2016-01-01

    and indicators such as patents, collaborative publications and license agreements, to assess the contribution to the socioeconomic surrounding of universities. In this study, we present an extension of the current empirical framework by applying new computational methods, namely text mining and pattern...... associated the former with the latter to obtain insights into possible text and semantic relatedness. The text mining methods are extrapolating the correlations, semantic patterns and content comparison of the two corpora to define the document relatedness. We expect the development of a novel tool using...... recognition. Text samples for this purpose can include files containing social media contents, company websites and annual reports. The empirical focus in the present study is on the technical sciences and in particular on the case of the Technical University of Denmark (DTU). We generated two independent...

  16. Texts of the Agency's agreements with the Republic of Austria

    International Nuclear Information System (INIS)

    1996-01-01

    The document reproduces the text of the exchange of Notes, dated 6 July 1995 and 29 September 1995 respectively, between the IAEA and the Ministry of Foreign Affairs of Austria regarding Section 4(b) of the Headquarters Agreement which allows the IAEA 'to establish and operate such additional radio and other telecommunications facilities as may be specified by supplemental agreement...'. This further supplemental agreement entered into force on 29 September 1995

  17. Engineering Documentation and Data Control

    Science.gov (United States)

    Matteson, Michael J.; Bramley, Craig; Ciaruffoli, Veronica

    2001-01-01

    Mississippi Space Services (MSS) the facility services contractor for NASA's John C. Stennis Space Center (SSC), is utilizing technology to improve engineering documentation and data control. Two identified improvement areas, labor intensive documentation research and outdated drafting standards, were targeted as top priority. MSS selected AutoManager(R) WorkFlow from Cyco software to manage engineering documentation. The software is currently installed on over 150 desctops. The outdated SSC drafting standard was written for pre-CADD drafting methods, in other words, board drafting. Implementation of COTS software solutions to manage engineering documentation and update the drafting standard resulted in significant increases in productivity by reducing the time spent searching for documents.

  18. Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.

    Science.gov (United States)

    Lu, Zhiyong; Hirschman, Lynette

    2012-01-01

    Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.

  19. LOCAL BINARIZATION FOR DOCUMENT IMAGES CAPTURED BY CAMERAS WITH DECISION TREE

    Directory of Open Access Journals (Sweden)

    Naser Jawas

    2012-07-01

    Full Text Available Character recognition in a document image captured by a digital camera requires a good binary image as the input for the separation the text from the background. Global binarization method does not provide such good separation because of the problem of uneven levels of lighting in images captured by cameras. Local binarization method overcomes the problem but requires a method to partition the large image into local windows properly. In this paper, we propose a local binariation method with dynamic image partitioning using integral image and decision tree for the binarization decision. The integral image is used to estimate the number of line in the document image. The number of line in the document image is used to devide the document into local windows. The decision tree makes a decision for threshold in every local window. The result shows that the proposed method can separate the text from the background better than using global thresholding with the best OCR result of the binarized image is 99.4%. Pengenalan karakter pada sebuah dokumen citra yang diambil menggunakan kamera digital membutuhkan citra yang terbinerisasi dengan baik untuk memisahkan antara teks dengan background. Metode binarisasi global tidak memberikan hasil pemisahan yang bagus karena permasalahan tingkat pencahayaan yang tidak seimbang pada citra hasil kamera digital. Metode binarisasi lokal dapat mengatasi permasalahan tersebut namun metode tersebut membutuhkan metode untuk membagi citra ke dalam bagian-bagian window lokal. Pada paper ini diusulkan sebuah metode binarisasi lokal dengan pembagian citra secara dinamis menggunakan integral image dan decision tree untuk keputusan binarisasi lokalnya. Integral image digunakan untuk mengestimasi jumlah baris teks dalam dokumen citra. Jumlah baris tersebut kemudian digunakan untuk membagi citra dokumen ke dalam window lokal. Keputusan nilai threshold untuk setiap window lokal ditentukan dengan decisiontree. Hasilnya menunjukkan

  20. Comparison of historical documents for writership

    Science.gov (United States)

    Ball, Gregory R.; Pu, Danjun; Stritmatter, Roger; Srihari, Sargur N.

    2010-01-01

    Over the last century forensic document science has developed progressively more sophisticated pattern recognition methodologies for ascertaining the authorship of disputed documents. These include advances not only in computer assisted stylometrics, but forensic handwriting analysis. We present a writer verification method and an evaluation of an actual historical document written by an unknown writer. The questioned document is compared against two known handwriting samples of Herman Melville, a 19th century American author who has been hypothesized to be the writer of this document. The comparison led to a high confidence result that the questioned document was written by the same writer as the known documents. Such methodology can be applied to many such questioned documents in historical writing, both in literary and legal fields.