WorldWideScience

Sample records for web page classification

  1. Innovating Web Page Classification Through Reducing Noise

    Institute of Scientific and Technical Information of China (English)

    LI Xiaoli (李晓黎); SHI Zhongzhi(史忠植)

    2002-01-01

    This paper presents a new method that eliminates noise in Web page classification. It first describes the presentation of a Web page based on HTML tags. Then through a novel distance formula, it eliminates the noise in similarity measure. After carefully analyzing Web pages, we design an algorithm that can distinguish related hyperlinks from noisy ones.We can utilize non-noisy hyperlinks to improve the performance of Web page classification (the CAWN algorithm). For any page, wecan classify it through the text and category of neighbor pages related to the page. The experimental results show that our approach improved classification accuracy.

  2. Generating Best Features for Web Page Classification

    Directory of Open Access Journals (Sweden)

    K. Selvakuberan

    2008-03-01

    Full Text Available As the Internet provides millions of web pages for each and every search term, getting interesting and required results quickly from the Web becomes very difficult. Automatic classification of web pages into relevant categories is the current research topic which helps the search engine to get relevant results. As the web pages contain many irrelevant, infrequent and stop words that reduce the performance of the classifier, extracting or selecting representative features from the web page is an essential pre-processing step. The goal of this paper is to find minimum number of highly qualitative features by integrating feature selection techniques. We conducted experiments with various numbers of features selected by different feature selection algorithms on a well defined initial set of features and show that cfssubset evaluator combined with term frequency method gives minimal qualitative features enough to attain considerable classification accuracy.

  3. Machine Learning Algorithms in Web Page Classification

    Directory of Open Access Journals (Sweden)

    W.A.AWAD

    2012-11-01

    Full Text Available In this paper we use machine learning algorithms like SVM, KNN and GIS to perform a behaviorcomparison on the web pages classifications problem, from the experiment we see in the SVM with smallnumber of negative documents to build the centroids has the smallest storage requirement and the least online test computation cost. But almost all GIS with different number of nearest neighbors have an evenhigher storage requirement and on line test computation cost than KNN. This suggests that some futurework should be done to try to reduce the storage requirement and on list test cost of GIS.

  4. Web page classification on child suitability

    NARCIS (Netherlands)

    Eickhoff, C.; Serdyukov, P.; Vries, A.P. de

    2010-01-01

    Children spend significant amounts of time on the Internet. Recent studies showed, that during these periods they are often not under adult supervision. This work presents an automatic approach to identifying suitable web pages for children based on topical and non-topical web page aspects. We discu

  5. A Syntactic Classification based Web Page Ranking Algorithm

    CERN Document Server

    Mukhopadhyay, Debajyoti; Kim, Young-Chon

    2011-01-01

    The existing search engines sometimes give unsatisfactory search result for lack of any categorization of search result. If there is some means to know the preference of user about the search result and rank pages according to that preference, the result will be more useful and accurate to the user. In the present paper a web page ranking algorithm is being proposed based on syntactic classification of web pages. Syntactic Classification does not bother about the meaning of the content of a web page. The proposed approach mainly consists of three steps: select some properties of web pages based on user's demand, measure them, and give different weightage to each property during ranking for different types of pages. The existence of syntactic classification is supported by running fuzzy c-means algorithm and neural network classification on a set of web pages. The change in ranking for difference in type of pages but for same query string is also being demonstrated.

  6. Key-phrase based classification of public health web pages.

    Science.gov (United States)

    Dolamic, Ljiljana; Boyer, Célia

    2013-01-01

    This paper describes and evaluates the public health web pages classification model based on key phrase extraction and matching. Easily extendible both in terms of new classes as well as the new language this method proves to be a good solution for text classification faced with the total lack of training data. To evaluate the proposed solution we have used a small collection of public health related web pages created by a double blind manual classification. Our experiments have shown that by choosing the adequate threshold value the desired value for either precision or recall can be achieved.

  7. Document representations for classification of short web-page descriptions

    Directory of Open Access Journals (Sweden)

    Radovanović Miloš

    2008-01-01

    Full Text Available Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-of- words document representations on the performance of five major classifiers - Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from the dmoz Open Directory Web-page ontology, and classifiers are trained to automatically determine the topics which may be relevant to a previously unseen Web-page. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F1 and F2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships. .

  8. Research of Web Pages Categorization

    Institute of Scientific and Technical Information of China (English)

    Zhongda Lin; Kun Deng; Yanfen Hong

    2006-01-01

    In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give some suggestions for web pages categorization.

  9. Hierarchical Web Page Classification Based on a Topic Model and Neighboring Pages Integration

    OpenAIRE

    Sriurai, Wongkot; Meesad, Phayung; Haruechaiyasak, Choochart

    2010-01-01

    Most Web page classification models typically apply the bag of words (BOW) model to represent the feature space. The original BOW representation, however, is unable to recognize semantic relationships between terms. One possible solution is to apply the topic model approach based on the Latent Dirichlet Allocation algorithm to cluster the term features into a set of latent topics. Terms assigned into the same topic are semantically related. In this paper, we propose a novel hierarchical class...

  10. Fuzzy Clustering Method for Web User Based on Pages Classification

    Institute of Scientific and Technical Information of China (English)

    ZHAN Li-qiang; LIU Da-xin

    2004-01-01

    A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article.The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log.After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users.Finally, it gets the clustering result through the fuzzy clustering method.The experimental results show the effectiveness of the method.

  11. An ant colony optimization based feature selection for web page classification.

    Science.gov (United States)

    Saraç, Esra; Özel, Selma Ayşe

    2014-01-01

    The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  12. An Ant Colony Optimization Based Feature Selection for Web Page Classification

    Directory of Open Access Journals (Sweden)

    Esra Saraç

    2014-01-01

    Full Text Available The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

  13. Web Page Design.

    Science.gov (United States)

    Lindsay, Lorin

    Designing a web home page involves many decisions that affect how the page will look, the kind of technology required to use the page, the links the page will provide, and kinds of patrons who can use the page. The theme of information literacy needs to be built into every web page; users need to be taught the skills of sorting and applying…

  14. Web Page Classification using an ensemble of support vector machine classifiers

    Directory of Open Access Journals (Sweden)

    Shaobo Zhong

    2011-11-01

    Full Text Available Web Page Classification (WPC is both an important and challenging topic in data mining. The knowledge of WPC can help users to obtain useable information from the huge internet dataset automatically and efficiently. Many efforts have been made to WPC. However, there is still room for improvement of current approaches. One particular challenge in training classifiers comes from the fact that the available dataset is usually unbalanced. Standard machine learning algorithms tend to be overwhelmed by the major class and ignore the minor one and thus lead to high false negative rate. In this paper, a novel approach for Web page classification was proposed to address this problem by using an ensemble of support vector machine classifiers to perform this work. Principal Component Analysis (PCA is used for feature reduction and Independent Component Analysis (ICA for feature selection. The experimental results indicate that the proposed approach outperforms other existing classifiers widely used in WPC.

  15. Research on Web Page Automatic Classification Based on Internet News Corpus

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Web pages contain more abundant contents than pure text ,such as hyperlinks ,html tags and metadata et al. So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.

  16. A Quaternary-Stage User Interest Model Based on User Browsing Behavior and Web Page Classification

    Institute of Scientific and Technical Information of China (English)

    Zongli Jiang; Hang Su

    2012-01-01

    The key to personalized search engine lies in user model. Traditional personalized model results in that the search results of secondary search are partial to the long-term interests, besides, forgetting to the long-term interests disenables effective recollection of user interests. This paper presents a quaternary-stage user interest model based on user browsing behavior and web page classification, which consults the principles of cache and recycle bin in operating system, by setting up an illuminating text-stage and a recycle bin interest-stage in front and rear of the traditional interest model respectively to constitute the quaternary-stage user interest model. The model can better reflect the user interests, by using an adaptive natural weight and its calculation method, and by efficiently integrating user browsing behavior and web document content.

  17. An Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation

    OpenAIRE

    Houqing Lu; Donghui Zhan; Lei Zhou; Dengchao He

    2016-01-01

    A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. However, the performance of the current focused crawling can easily suffer the impact of the environments of web pages and multiple topic web pages. In the crawling process, a highly relevant region may be ignored owing to the low overall relevance of that page, and anchor text or link-context may misguide crawlers. In order to solve these problems, this paper pr...

  18. An Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation

    OpenAIRE

    Houqing Lu; Donghui Zhan; Lei Zhou; Dengchao He

    2016-01-01

    A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. However, the performance of the current focused crawling can easily suffer the impact of the environments of web pages and multiple topic web pages. In the crawling process, a highly relevant region may be ignored owing to the low overall relevance of that page, and anchor text or link-context may misguide crawlers. In order to solve these problems, this paper pr...

  19. Creating Web Pages Simplified

    CERN Document Server

    Wooldridge, Mike

    2011-01-01

    The easiest way to learn how to create a Web page for your family or organization Do you want to share photos and family lore with relatives far away? Have you been put in charge of communication for your neighborhood group or nonprofit organization? A Web page is the way to get the word out, and Creating Web Pages Simplified offers an easy, visual way to learn how to build one. Full-color illustrations and concise instructions take you through all phases of Web publishing, from laying out and formatting text to enlivening pages with graphics and animation. This easy-to-follow visual guide sho

  20. An Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation

    Directory of Open Access Journals (Sweden)

    Houqing Lu

    2016-01-01

    Full Text Available A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. However, the performance of the current focused crawling can easily suffer the impact of the environments of web pages and multiple topic web pages. In the crawling process, a highly relevant region may be ignored owing to the low overall relevance of that page, and anchor text or link-context may misguide crawlers. In order to solve these problems, this paper proposes a new focused crawler. First, we build a web page classifier based on improved term weighting approach (ITFIDF, in order to gain highly relevant web pages. In addition, this paper introduces an evaluation approach of the link, link priority evaluation (LPE, which combines web page content block partition algorithm and the strategy of joint feature evaluation (JFE, to better judge the relevance between URLs on the web page and the given topic. The experimental results demonstrate that the classifier using ITFIDF outperforms TFIDF, and our focused crawler is superior to other focused crawlers based on breadth-first, best-first, anchor text only, link-context only, and content block partition in terms of harvest rate and target recall. In conclusion, our methods are significant and effective for focused crawler.

  1. Review of Research on Chinese Web Page Classification%中文网页分类研究综述

    Institute of Scientific and Technical Information of China (English)

    李勇

    2012-01-01

    研究人员对网页分类进行大量富有成效的研究工作,截至目前与网页分类相关的研究主要集中于如何选择合适的分类特征、如何设计高效的分类算法这两个方面。从上述两个角度对当前网页分类技术的研究现状进行归纳和综述,以便后续研究人员能更好、更准确地把握网页分类的研究动态。%The researchers have carried out a great deal of fruitful research work on this issue. Up to now, the research on Web classification focuses on following two aspects: how to choose the appro- priate category characteristics, and how to design efficient classification algorithm. According to the characteristics of Web pages, summarizes and reviews the current Web page classification technology by these two perspectives, and it will be convenient for the researchers to grasp the research dynamic on Web pages classification better and accurately.

  2. Web Information Extraction Research Based on Page Classification%基于页面分类的 Web 信息抽取方法研究

    Institute of Scientific and Technical Information of China (English)

    成卫青; 于静; 杨晶; 杨龙

    2013-01-01

    By means of analysis of existing Web information extraction and the current Web page characteristics,current extraction tech-niques are found to have problems that the types of extract page fixed and the extract results are not accurate. In order to make up for the deficiency mentioned above,propose a Web information extraction method based on page classification. This method is able to complete the extraction of the mainstream of information on the Internet page. By classifying the Web page and extracting the main body of the page,it overcomes the two problems existing in traditional method respectively. A complete model of the Web information extraction is designed and the details of each functional module are provided. The unique features of the model are containing modules of Web page principle part extraction and Web page classification,as well as using regular expression to generate extraction rules automatically that promote the generality and precision of the extraction method. Experimental results have verified the validity and accuracy of the method.%  通过对现有 Web 信息抽取方法和当前 Web 网页特点的分析,发现现有抽取技术存在抽取页面类型固定和抽取结果不准确的问题,为了弥补以上两个不足,文中提出了一种基于页面分类的 Web 信息抽取方法,此方法能够完成对互联网上主流信息的提取。通过对页面进行分类和对页面主体的提取,分别克服传统方法抽取页面类型固定和抽取结果不够准确的问题。文中设计了一个完整的 Web 信息抽取模型,并给出了各功能模块的实现方法。该模型包含页面主体提取、页面分类和信息抽取等模块,并利用正则表达式自动生成抽取规则,提高了抽取方法的通用性和准确性。最后用实验证实了文中方法的有效性与正确性。

  3. 基于Ontology和EM方法的网页分类研究%Web Page Classification Research Based on Ontology and EM

    Institute of Scientific and Technical Information of China (English)

    丁艳; 曹倩; 王超; 潘金贵

    2003-01-01

    Works on abstracting semantic information from substantive pages of Web and their usage in search engine can lead to intelligent retrieval ,or other individual services. This paper mainly focuses on some research about analysis of Web page classification infor. Ontology as a base,using TFIDF word weights and Rocchio algorithm is combined with EM to improve accuracy of classifier. It's proved that this EM procedure works well on enhancing the veracity by the usage of unlabeled pages when the samples are limited.

  4. Web Page Recommendation Using Web Mining

    Directory of Open Access Journals (Sweden)

    Modraj Bhavsar

    2014-07-01

    Full Text Available On World Wide Web various kind of content are generated in huge amount, so to give relevant result to user web recommendation become important part of web application. On web different kind of web recommendation are made available to user every day that includes Image, Video, Audio, query suggestion and web page. In this paper we are aiming at providing framework for web page recommendation. 1 First we describe the basics of web mining, types of web mining. 2 Details of each web mining technique.3We propose the architecture for the personalized web page recommendation.

  5. Code AI Personal Web Pages

    Science.gov (United States)

    Garcia, Joseph A.; Smith, Charles A. (Technical Monitor)

    1998-01-01

    The document consists of a publicly available web site (george.arc.nasa.gov) for Joseph A. Garcia's personal web pages in the AI division. Only general information will be posted and no technical material. All the information is unclassified.

  6. Web Page Design (Part Three).

    Science.gov (United States)

    Descy, Don E.

    1997-01-01

    Discusses fonts as well as design considerations that should be reviewed when designing World Wide Web pages and sites to make them easier for clients to use and easier to maintain. Also discusses the simplicity of names; organization of pages, folders, and files; and sites to help build Web sites. (LRW)

  7. Sign Language Web Pages

    Science.gov (United States)

    Fels, Deborah I.; Richards, Jan; Hardman, Jim; Lee, Daniel G.

    2006-01-01

    The World Wide Web has changed the way people interact. It has also become an important equalizer of information access for many social sectors. However, for many people, including some sign language users, Web accessing can be difficult. For some, it not only presents another barrier to overcome but has left them without cultural equality. The…

  8. JERHRE's New Web Pages.

    Science.gov (United States)

    2006-06-01

    JERHRE'S WEBSITE, www.csueastbay.edu/JERHRE/ has two new pages. One of those pages is devoted to curriculum that may be used to educate students, investigators and ethics committee members about issues in the ethics of human subjects research, and to evaluate their learning. It appears at www.csueastbay.edu/JERHRE/cur.html. The other is devoted to emailed letters from readers. Appropriate letters will be posted as soon as they are received by the editor. Letters from readers appear at www.csueastbay.edu/JERHRE/let.html.

  9. Myanmar Web Pages Crawler

    National Research Council Canada - National Science Library

    Su Mon Khine; Yadana Thein

    2015-01-01

    .... There is very little research area in crawling for Myanmar Language web sites. Most of the language specific crawlers are based on n-gram character sequences wh ich require training documents, the proposed crawler differ from those crawlers...

  10. Database-Based Web Page

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Database-based web page which uses IIS4.0 + ASP + ADO + SQL7.0 isbriefly introduced. It has been successfully used in E-commerce , bulletin board system and chat room, and so on in the web site of Computer Center Hudong Campus, Tong ji University.

  11. Hidden Page WebCrawler Model for Secure Web Pages

    Directory of Open Access Journals (Sweden)

    K. F. Bharati

    2013-03-01

    Full Text Available The traditional search engines available over the internet are dynamic in searching the relevant content over the web. The search engine has got some constraints like getting the data asked from a varied source, where the data relevancy is exceptional. The web crawlers are designed only to more towards a specific path of the web and are restricted in moving towards a different path as they are secured or at times restricted due to the apprehension of threats. It is possible to design a web crawler that will have the capability of penetrating through the paths of the web, not reachable by the traditional web crawlers, in order to get a better solution in terms of data, time and relevancy for the given search query. The paper makes use of a newer parser and indexer for coming out with a novel idea of web crawler and a framework to support it. The proposed web crawler is designed to attend Hyper Text Transfer Protocol Secure (HTTPS based websites and web pages that needs authentication to view and index. User has to fill a search form and his/her creditionals will be used by the web crawler to attend secure web server for authentication. Once it is indexed the secure web server will be inside the web crawler’s accessible zone

  12. The Faculty Web Page: Contrivance or Continuation?

    Science.gov (United States)

    Lennex, Lesia

    2007-01-01

    In an age of Internet education, what does it mean for a tenure/tenure-track faculty to have a web page? How many professors have web pages? If they have a page, what does it look like? Do they really need a web page at all? Many universities have faculty web pages. What do those collective pages look like? In what way do they represent the…

  13. DISTRIBUTED APPROACH to WEB PAGE CATEGORIZATION USING MAPREDUCE PROGRAMMING MODEL

    Directory of Open Access Journals (Sweden)

    P.Malarvizhi

    2011-12-01

    Full Text Available The web is a large repository of information and to facilitate the search and retrieval of pages from it,categorization of web documents is essential. An effective means to handle the complexity of information retrieval from the internet is through automatic classification of web pages. Although lots of automatic classification algorithms and systems have been presented, most of the existing approaches are computationally challenging. In order to overcome this challenge, we have proposed a parallel algorithm, known as MapReduce programming model to automatically categorize the web pages. This approach incorporates three concepts. They are web crawler, MapReduce programming model and the proposed web page categorization approach. Initially, we have utilized web crawler to mine the World Wide Web and the crawled web pages are then directly given as input to the MapReduce programming model. Here the MapReduce programming model adapted to our proposed web page categorization approach finds the appropriate category of the web page according to its content. The experimental results show that our proposed parallel web page categorization approach achieves satisfactory results in finding the right category for any given web page.

  14. Design of Educational Web Pages

    Science.gov (United States)

    Galan, Jose Gomez; Blanco, Soledad Mateos

    2004-01-01

    The methodological characteristics of teaching in primary and secondary education make it necessary to revise the pedagogical and instructive lines with which to introduce the new Information and Communication Technologies into the school context. The construction of Web pages that can be used to improve student learning is, therefore, fundamental…

  15. Web Page Design (Part One).

    Science.gov (United States)

    Descy, Don E.

    1997-01-01

    Discusses rules for Web page design: consider audiences' Internet skills and equipment; know your content; outline the material; map or sketch the site; be consistent; regulate size of graphics to control download time; place eye catching material in the first 300 pixels; moderate use of color to control file size and bandwidth; include a…

  16. Learning through Web Page Design.

    Science.gov (United States)

    Peel, Deborah

    2001-01-01

    Describes and evaluates the use of Web page design in an undergraduate course in the United Kingdom on town planning. Highlights include incorporating information and communication technologies into higher education; and a theoretical framework for the use of educational technology. (LRW)

  17. Interstellar Initiative Web Page Design

    Science.gov (United States)

    Mehta, Alkesh

    1999-01-01

    This summer at NASA/MSFC, I have contributed to two projects: Interstellar Initiative Web Page Design and Lenz's Law Relative Motion Demonstration. In the Web Design Project, I worked on an Outline. The Web Design Outline was developed to provide a foundation for a Hierarchy Tree Structure. The Outline would help design a Website information base for future and near-term missions. The Website would give in-depth information on Propulsion Systems and Interstellar Travel. The Lenz's Law Relative Motion Demonstrator is discussed in this volume by Russell Lee.

  18. Web Page Design and Network Analysis.

    Science.gov (United States)

    Wan, Hakman A.; Chung, Chi-wai

    1998-01-01

    Examines problems in Web-site design from the perspective of network analysis. In view of the similarity between the hypertext structure of Web pages and a generic network, network analysis presents concepts and theories that provide insight for Web-site design. Describes the problem of home-page location and control of number of Web pages and…

  19. Exploiting link structure for web page genre identification

    KAUST Repository

    Zhu, Jia

    2015-07-07

    As the World Wide Web develops at an unprecedented pace, identifying web page genre has recently attracted increasing attention because of its importance in web search. A common approach for identifying genre is to use textual features that can be extracted directly from a web page, that is, On-Page features. The extracted features are subsequently inputted into a machine learning algorithm that will perform classification. However, these approaches may be ineffective when the web page contains limited textual information (e.g., the page is full of images). In this study, we address genre identification of web pages under the aforementioned situation. We propose a framework that uses On-Page features while simultaneously considering information in neighboring pages, that is, the pages that are connected to the original page by backward and forward links. We first introduce a graph-based model called GenreSim, which selects an appropriate set of neighboring pages. We then construct a multiple classifier combination module that utilizes information from the selected neighboring pages and On-Page features to improve performance in genre identification. Experiments are conducted on well-known corpora, and favorable results indicate that our proposed framework is effective, particularly in identifying web pages with limited textual information. © 2015 The Author(s)

  20. 基于极限学习机的网页分类应用%Classification of web pages based on extreme learning machine

    Institute of Scientific and Technical Information of China (English)

    陈先福; 李石君; 曾慧

    2015-01-01

    ELM extreme learning machine is different from traditional neural network learning algorithm(such as BP algo-rithm), is a highly efficient Single hidden Layer Feedforward Neural network(SLFNs)learning algorithm. In this paper, ELM is introduced to Chinese web page classification task. Trait tree of web page is formed after pre-processing the Chi-nese web and extracting its characteristic information. Fixed-length coding is produced and took as input data of ELM. Experimental results show that the method can effectively classify web pages.%极限学习机ELM不同于传统的神经网络学习算法(如BP算法),是一种高效的单隐层前馈神经网络(SLFNs)学习算法。将极限学习机引入到中文网页分类任务中。对中文网页进行预处理,提取其特性信息,从而形成网页特征树,产生定长编码作为极限学习机的输入数据。实验结果表明该方法能够有效地分类网页。

  1. Deriving Dynamics of Web Pages: A Survey

    OpenAIRE

    Oita, Marilena; Senellart, Pierre

    2011-01-01

    International audience; The World Wide Web is dynamic by nature: content is continuously added, deleted, or changed, which makes it challenging for Web crawlers to keep up-to-date with the current version of a Web page, all the more so since not all apparent changes are significant ones. We review major approaches to change detection in Web pages and extraction of temporal properties (especially, timestamps) of Web pages. We focus our attention on techniques and systems that have been propose...

  2. Web Page Watermarking for Tamper-Proof

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The watermark is then embedded into the web page after each word and each line. When a watermarked web page is tampered, the extracted watermark can detect and locate the modifications to the web page. Besides, the framework of watermarked Web Server system was given.Compared with traditional digital signature methods, this watermarking method is more transparent in that there is no necessary to detach the watermark before displaying web pages. The experimental results show that the proposed scheme is an effective tool for tamper-proof of web pages.

  3. Optimization of web pages for search engines

    OpenAIRE

    Harej, Anže

    2011-01-01

    The thesis describes the most important elements of a Web Page and outside factors that affect Search Engine Optimization. The basic structure of a Web page, structure and functionality of a modern Search Engine is described at the beginning. The first section deals with the start of Search Engine Optimization, including planning, analysis of web space and the selection of the most important keywords for which the site will be optimized. The next section Web Page Optimization describes...

  4. Classifying web pages with visual features

    NARCIS (Netherlands)

    de Boer, V.; van Someren, M.; Lupascu, T.; Filipe, J.; Cordeiro, J.

    2010-01-01

    To automatically classify and process web pages, current systems use the textual content of those pages, including both the displayed content and the underlying (HTML) code. However, a very important feature of a web page is its visual appearance. In this paper, we show that using generic visual fea

  5. Web Page Categorization Using Artificial Neural Networks

    CERN Document Server

    Kamruzzaman, S M

    2010-01-01

    Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web pages have been selected for categorization; these are business & economy, education, government, entertainment, sports, news & media, job search, and science. The whole process of the proposed system is done in three successive stages. In the first stage, the features are automatically extracted through analyzing the source of the web pages. The second stage includes fixing the input values of the neural network; all the values remain between 0 and 1. The variations in those values affect the output. Finally the third stage determines the class of a certain web page out of eight predefined classes. This stage i...

  6. 基于文本分类的林业Web黄页分类系统%Forestry Web Yellow Page Category System Based on Text Classification

    Institute of Scientific and Technical Information of China (English)

    王欢; 武刚; 杨抒

    2012-01-01

    The paper employs text classification technology to forestry Web yellow page field, and realizes a high application and management in forestry Web yellow page information. The paper also Discusses a multi-level text classification system, which provides the design and the key technologies of category system as well as the feature selection of category discriminating. The system shows good precision and recall ratio.%将文本分类技术应用于林业Web黄页的分类,实现了林业Web黄页信息的高效应用和管理.讨论了林业Web黄页多层次分类体系,并给出了分类系统的设计方案和关键技术,详细介绍了类别区分词特征选择算法.实验结果具有较好的准确率和查全率.

  7. Mimicked Web Page Detection over Internet

    Directory of Open Access Journals (Sweden)

    Y. Narasimha Rao

    2014-01-01

    Full Text Available Phishing is process of steeling valuable information such as ATM pins, Credit card details over internet. Where the attacker creates mimicked web pages from the legitimate web pages to fool users. In this paper, we propose an effective anti-phishing solution, by combining image based visual similarity based approach to detect plagiarized web pages. We used effective algorithms for our detection mechanism, speeded up Robust Features (SURF algorithm in order to generate signature based on extracting stable key points from the screen shot of the web page. When a legitimate web page is registered with our system, this algorithm applied on that web page in order to generate signatures, and these signatures are stored in the database for our trained system. When there is a suspected web page, this algorithm is applied to generate both the signatures of the suspected page and is verified against our database of corresponding legitimate web pages. Our results verified that our proposed system is very effective to detect the mimicked web pages with minimal false positives

  8. Museum: Multidimensional web page segment evaluation model

    CERN Document Server

    Kuppusamy, K S

    2012-01-01

    The evaluation of a web page with respect to a query is a vital task in the web information retrieval domain. This paper proposes the evaluation of a web page as a bottom-up process from the segment level to the page level. A model for evaluating the relevancy is proposed incorporating six different dimensions. An algorithm for evaluating the segments of a web page, using the above mentioned six dimensions is proposed. The benefits of fine-granining the evaluation process to the segment level instead of the page level are explored. The proposed model can be incorporated for various tasks like web page personalization, result re-ranking, mobile device page rendering etc.

  9. Identifying Information Senders of Web Pages

    Science.gov (United States)

    Kato, Yoshikiyo; Kawahara, Daisuke; Inui, Kentaro; Kurohashi, Sadao; Shibata, Tomohide

    The source of information is one of the crucial elements when judging the credibility of the information. On the current Web, however, the information about the source is not readily available to the users. In this paper, we formulate the problem of identifying the information source as the problem of identifying the information sender configuration (ISC) of a Web page. An information sender of a Web page is an entity which is involved in the publication of the information on the page. An information sender configuration of a Web page describes the information senders of the page and the relationship among them. Information sender identification is a sub-problem of identifying ISC, and we present a method for extracting information senders from Web pages, along with its evaluation. ISC provides a basis for deeper analysis of information on the Web.

  10. Web Page Recommendation Models Theory and Algorithms

    CERN Document Server

    Gündüz-Ögüdücü, Sule

    2010-01-01

    One of the application areas of data mining is the World Wide Web (WWW or Web), which serves as a huge, widely distributed, global information service for every kind of information such as news, advertisements, consumer information, financial management, education, government, e-commerce, health services, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information, Web page access and usage information, providing sources for data mining. The amount of information on the Web is growing rapidly, as well as the number of Web sites and Web page

  11. Minimal Guidelines for Authors of Web Pages.

    Science.gov (United States)

    ADE Bulletin, 2002

    2002-01-01

    Presents guidelines that recommend the minimal reference information that should be provided on Web pages intended for use by students, teachers, and scholars in the modern languages. Suggests the inclusion of information about responsible parties, copyright declaration, privacy statements, and site information. Makes a note on Web page style. (SG)

  12. Dynamic Web Pages: Performance Impact on Web Servers.

    Science.gov (United States)

    Kothari, Bhupesh; Claypool, Mark

    2001-01-01

    Discussion of Web servers and requests for dynamic pages focuses on experimentally measuring and analyzing the performance of the three dynamic Web page generation technologies: CGI, FastCGI, and Servlets. Develops a multivariate linear regression model and predicts Web server performance under some typical dynamic requests. (Author/LRW)

  13. Classification of the web

    DEFF Research Database (Denmark)

    Mai, Jens Erik

    2004-01-01

    This paper discusses the challenges faced by investigations into the classification of the Web and outlines inquiries that are needed to use principles for bibliographic classification to construct classifications of the Web. This paper suggests that the classification of the Web meets challenges...

  14. Web Classification Using DYN FP Algorithm

    Directory of Open Access Journals (Sweden)

    Bhanu Pratap Singh

    2014-01-01

    Full Text Available Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes Web Search, Classification and Personalization etc. The primary goal of the web site is to provide the relevant information to the users. Web mining technique is used to categorize users and pages by analyzing users behavior, the content of pages and order of URLs accessed. In this paper, proposes an auto-classification algorithm of web pages using data mining techniques. The problem of discovering association rules between terms in a set of web pages belonging to a category in a search engine database, and present an auto – classification algorithm for solving this problem that are fundamentally based on FP-growth algorithm

  15. DESIGNING AN ENGLISH LEARNING WEB PAGE

    Institute of Scientific and Technical Information of China (English)

    Wu; Xiaozhen

    1999-01-01

    This paper reviews the development in CALL research and the currently acknowledged guide-lines for CALL designing.Following these guidelines,the author designed an English learning Web pageof her own.Target learners,rationale,designing aids,as well as the lesson plan using the Web page,areincluded.

  16. CERN Web Pages Receive a Makeover

    CERN Multimedia

    2001-01-01

    Asudden allergic reaction to the colour turquoise? Never fear, from Monday 2 April you'll be able to click in the pink box at the top of the CERN users' welcome page to go to the all-new welcome page, which is simpler and better organized. CERN's new-look intranet is the first step in a complete Web-makeover being applied by the Web Public Education (WPE) group of ETT Division. The transition will be progressive, to allow users to familiarize themselves with the new pages. Until 17 April, CERN users will still get the familiar turquoise welcome page by default, with the new pages operating in parallel. From then on, the default will switch to the new pages, with the old ones being finally switched off on 25 May. Some 400 pages have received the makeover treatment. For more information about the changes to your Web, take a look at: http://www.cern.ch/CERN/NewUserPages/ Happy surfing!

  17. 基于频率共现熵的跨语言网页自动分类研究%Web Pages Auto Classification Based on Frequently Co-Occurring Entropy

    Institute of Scientific and Technical Information of China (English)

    柯丽; 王明文; 何世柱; 黎佳; 罗远胜

    2011-01-01

    研究了基于频率共现熵的跨语言网页自动分类问题,使用翻译软件将所有中文网页翻译为英文,计算中文和英文网页的共现特征频率共现熵值,确定中文和英文网页的共现知识,并与英文网页相结合训练中文分类模型.实验结果表明,该方法与贝叶斯分类模型、向量空间分类模型和信息瓶颈模型相比体现出良好的性能.%An approach to address the cross-language web pages automatic classification problem based on fre quently co-occurring entropy (FCE) is been proposed. The algorithm first translating all Chinese web pages to English by simple translation software. Second, computing the frequently co-occurring entropy using all Chinese and English web pages. Third, selecting the common part between Chinese pages and English pages based on the FCE ranks. Last, training a Chinese classification model by English pages with the common part. The experimental results in ODP corpus show the method performs well performance than NB, SVM and IB models.

  18. An Efficient Web Page Ranking for Semantic Web

    Science.gov (United States)

    Chahal, P.; Singh, M.; Kumar, S.

    2014-01-01

    With the enormous amount of information presented on the web, the retrieval of relevant information has become a serious problem and is also the topic of research for last few years. The most common tools to retrieve information from web are search engines like Google. The Search engines are usually based on keyword searching and indexing of web pages. This approach is not very efficient as the result-set of web pages obtained include large irrelevant pages. Sometimes even the entire result-set may contain lot of irrelevant pages for the user. The next generation of search engines must address this problem. Recently, many semantic web search engines have been developed like Ontolook, Swoogle, which help in searching meaningful documents presented on semantic web. In this process the ranking of the retrieved web pages is very crucial. Some attempts have been made in ranking of semantic web pages but still the ranking of these semantic web documents is neither satisfactory and nor up to the user's expectations. In this paper we have proposed a semantic web based document ranking scheme that relies not only on the keywords but also on the conceptual instances present between the keywords. As a result only the relevant page will be on the top of the result-set of searched web pages. We explore all relevant relations between the keywords exploring the user's intention and then calculate the fraction of these relations on each web page to determine their relevance. We have found that this ranking technique gives better results than those by the prevailing methods.

  19. A Web Page Summarization for Mobile Phones

    Science.gov (United States)

    Hasegawa, Takaaki; Nishikawa, Hitoshi; Imamura, Kenji; Kikui, Gen'ichiro; Okumur, Manabu

    Recently, web pages for mobile devices are widely spread on the Internet and a lot of people can access web pages through search engines by mobile devices as well as personal computers. A summary of a retrieved web page is important because the people judge whether or not the page would be relevant to their information need according to the summary. In particular, the summary must be not only compact but also grammatical and meaningful when the users retrieve information using a mobile phone with a small screen. Most search engines seem to produce a snippet based on the keyword-in-context (KWIC) method. However, this simple method could not generate a refined summary suitable for mobile phones because of low grammaticality and content overlap with the page title. We propose a more suitable method to generate a snippet for mobile devices using sentence extraction and sentence compression methods. First, sentences are biased based on whether they include the query terms from the users or words that are relevant to the queries, as well as whether they do not overlap with the page title based on maximal marginal relevance (MMR). Second, the selected sentences are compressed based on their phrase coverage, which is measured by the scores of words, and their phrase connection probability measured based on the language model, according to the dependency structure converted from the sentence. The experimental results reveal the proposed method outperformed the KWIC method in terms of relevance judgment, grammaticality, non-redundancy and content coverage.

  20. Efficient Web Change Monitoring with Page Digest

    Energy Technology Data Exchange (ETDEWEB)

    Buttler, D J; Rocco, D; Liu, L

    2004-02-20

    The Internet and the World Wide Web have enabled a publishing explosion of useful online information, which has produced the unfortunate side effect of information overload: it is increasingly difficult for individuals to keep abreast of fresh information. In this paper we describe an approach for building a system for efficiently monitoring changes to Web documents. This paper has three main contributions. First, we present a coherent framework that captures different characteristics of Web documents. The system uses the Page Digest encoding to provide a comprehensive monitoring system for content, structure, and other interesting properties of Web documents. Second, the Page Digest encoding enables improved performance for individual page monitors through mechanisms such as short-circuit evaluation, linear time algorithms for document and structure similarity, and data size reduction. Finally, we develop a collection of sentinel grouping techniques based on the Page Digest encoding to reduce redundant processing in large-scale monitoring systems by grouping similar monitoring requests together. We examine how effective these techniques are over a wide range of parameters and have seen an order of magnitude speed up over existing Web-based information monitoring systems.

  1. Model for Predicting End User Web Page Response Time

    OpenAIRE

    Nagarajan, Sathya Narayanan; Ravikumar, Srijith

    2012-01-01

    Perceived responsiveness of a web page is one of the most important and least understood metrics of web page design, and is critical for attracting and maintaining a large audience. Web pages can be designed to meet performance SLAs early in the product lifecycle if there is a way to predict the apparent responsiveness of a particular page layout. Response time of a web page is largely influenced by page layout and various network characteristics. Since the network characteristics vary widely...

  2. Improving Web Page Readability by Plain Language

    CERN Document Server

    Hussain, Walayat; Ali, Arif

    2011-01-01

    In today's world anybody who wants to access any information the first choice is to use the web because it is the only source to provide easy and instant access to information. However web readers face many hurdles from web which includes load of web pages, text size, finding related information, spelling and grammar etc. However understanding of web pages written in English language creates great problems for non native readers who have basic knowledge of English. In this paper, we propose a plain language for a local language (Urdu) using English alphabets for web pages in Pakistan. For this purpose we developed two websites, one with a normal English fonts and other in a local language text scheme using English alphabets. We also conducted a questionnaire from 40 different users with a different level of English language fluency in Pakistan to gain the evidence of the practicality of our approach. The result shows that the proposed plain language text scheme using English alphabets improved the reading com...

  3. Improving Web Page Readability by Plain Language

    Directory of Open Access Journals (Sweden)

    Walayat Hussain

    2011-05-01

    Full Text Available In todays world anybody who wants to access any information the first choice is to use the web because it is the only source to provide easy and instant access to information. However web readers face many hurdles from web which includes load of web pages, text size, finding related information, spelling and grammar etc. However understanding of web pages written in English language creates great problems for non native readers who have basic knowledge of English. In this paper, we propose a plain language for a local language (Urdu using English alphabets for web pages in Pakistan. For this purpose we developed two websites, one with a normal English fonts and other in a local language text scheme using English alphabets. We also conducted a questionnaire from 40 different users with a different level of English language fluency in Pakistan to gain the evidence of the practicality of our approach. The result shows that the proposed plain language text scheme using English alphabets improved the reading comprehension for non native English speakers in Pakistan.

  4. Referencing web pages and e-journals.

    Science.gov (United States)

    Bryson, David

    2013-12-01

    One of the areas that can confuse students and authors alike is how to reference web pages and electronic journals (e-journals). The aim of this professional development article is to go back to first principles for referencing and see how with examples these should be referenced.

  5. Google's Web Page Ranking Applied to Different Topological Web Graph Structures.

    Science.gov (United States)

    Meghabghab, George

    2001-01-01

    This research, part of the ongoing study to better understand Web page ranking on the Web, looks at a Web page as a graph structure or Web graph, and classifies different Web graphs in the new coordinate space (out-degree, in-degree). Google's Web ranking algorithm (Brin & Page, 1998) on ranking Web pages is applied in this new coordinate…

  6. Developing a web page: bringing clinics online.

    Science.gov (United States)

    Peterson, Ronnie; Berns, Susan

    2004-01-01

    Introducing clinical staff education, along with new policies and procedures, to over 50 different clinical sites can be a challenge. As any staff educator will confess, getting people to attend an educational inservice session can be difficult. Clinical staff request training, but no one has time to attend training sessions. Putting the training along with the policies and other information into "neat" concise packages via the computer and over the company's intranet was the way to go. However, how do you bring the clinics online when some of the clinical staff may still be reluctant to turn on their computers for anything other than to gather laboratory results? Developing an easy, fun, and accessible Web page was the answer. This article outlines the development of the first training Web page at the University of Wisconsin Medical Foundation, Madison, WI.

  7. Evaluating Multilingual Gisting of Web Pages

    CERN Document Server

    Resnik, P

    1997-01-01

    We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form.

  8. Web Mining Using PageRank Algorithm

    Directory of Open Access Journals (Sweden)

    Vignesh. V

    2013-11-01

    Full Text Available Data mining is extracting and automatic discovering the web based information has been used as web mining. It is one of the most universal and a dominant application on the Internet and it becomes increasing in size and search tools that combine the results of multiple search engines are becoming more valuable. But, almost none of these studies deals with genetic relation algorithm (GRA, where GRA is one of the evolutionary methods with graph structure. GRA was designed to both increase the effectiveness of search engine and improve their efficiency. GRA considers the correlation coefficient between stock brands as strength, which indicates the relation between nodes in each individual of GRA. The reduced number of hyperlinks provided by GRA in the final generation consists of only the most similar hyperlinks with respect to the query. But, the end user’s not satisfied fully. To improve the satisfaction of user by using Page rank algorithm to measure the importance of a page and to prioritize pages returned from a GRA. It will reduce the user’s searching time. PageRank algorithm works to allocate rank for filtered links based on number of keyword occurred in the content.

  9. Model for Predicting End User Web Page Response Time

    CERN Document Server

    Nagarajan, Sathya Narayanan

    2012-01-01

    Perceived responsiveness of a web page is one of the most important and least understood metrics of web page design, and is critical for attracting and maintaining a large audience. Web pages can be designed to meet performance SLAs early in the product lifecycle if there is a way to predict the apparent responsiveness of a particular page layout. Response time of a web page is largely influenced by page layout and various network characteristics. Since the network characteristics vary widely from country to country, accurately modeling and predicting the perceived responsiveness of a web page from the end user's perspective has traditionally proven very difficult. We propose a model for predicting end user web page response time based on web page, network, browser download and browser rendering characteristics. We start by understanding the key parameters that affect perceived response time. We then model each of these parameters individually using experimental tests and statistical techniques. Finally, we d...

  10. Categorization of web pages - Performance enhancement to search engine

    Digital Repository Service at National Institute of Oceanography (India)

    Lakshminarayana, S.

    are the major areas of research in IR and strive to improve the effectiveness of interactive IR and can be used as performance evaluation tool. The classification studies at early stages were with strong human interaction than machine learning. The term... and the location of the link. In the absence such works, the spider/worm either moves to the next page available at the least time or by network selection. This classification serves in judgment of traversal of web spider/worm and minimization. Such processes...

  11. Required Discussion Web Pages in Psychology Courses and Student Outcomes

    Science.gov (United States)

    Pettijohn, Terry F., II; Pettijohn, Terry F.

    2007-01-01

    We conducted 2 studies that investigated student outcomes when using discussion Web pages in psychology classes. In Study 1, we assigned 213 students enrolled in Introduction to Psychology courses to either a mandatory or an optional Web page discussion condition. Students used the discussion Web page significantly more often and performed…

  12. Arabic web pages clustering and annotation using semantic class features

    Directory of Open Access Journals (Sweden)

    Hanan M. Alghamdi

    2014-12-01

    Full Text Available To effectively manage the great amount of data on Arabic web pages and to enable the classification of relevant information are very important research problems. Studies on sentiment text mining have been very limited in the Arabic language because they need to involve deep semantic processing. Therefore, in this paper, we aim to retrieve machine-understandable data with the help of a Web content mining technique to detect covert knowledge within these data. We propose an approach to achieve clustering with semantic similarities. This approach comprises integrating k-means document clustering with semantic feature extraction and document vectorization to group Arabic web pages according to semantic similarities and then show the semantic annotation. The document vectorization helps to transform text documents into a semantic class probability distribution or semantic class density. To reach semantic similarities, the approach extracts the semantic class features and integrates them into the similarity weighting schema. The quality of the clustering result has evaluated the use of the purity and the mean intra-cluster distance (MICD evaluation measures. We have evaluated the proposed approach on a set of common Arabic news web pages. We have acquired favorable clustering results that are effective in minimizing the MICD, expanding the purity and lowering the runtime.

  13. Weighted Page Content Rank for Ordering Web Search Result

    Directory of Open Access Journals (Sweden)

    POOJA SHARMA,

    2010-12-01

    Full Text Available With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for user’s to utilize automated tools in order to find, extract, filter and evaluate the desired information and resources. Web structure mining and content mining plays an effective role in this approach. There are two Ranking algorithms PageRank and Weighted PageRank. PageRank is a commonly used algorithm in Web Structure Mining. Weighted Page Rank also takes the importance of the inlinks and outlinks of the pages but the rank score to all links is not equally distributed. i.e. unequal distribution is performed. In this paper we proposed a new algorithm, Weighted Page Content Rank (WPCRbased on web content mining and structure mining that shows the relevancy of the pages to a given query is better determined, as compared to the existing PageRank and Weighted PageRank algorithms.

  14. Extraction of Flat and Nested Data Records from Web Pages

    CERN Document Server

    Hiremath, P S

    2010-01-01

    This paper studies the problem of identification and extraction of flat and nested data records from a given web page. With the explosive growth of information sources available on the World Wide Web, it has become increasingly difficult to identify the relevant pieces of information, since web pages are often cluttered with irrelevant content like advertisements, navigation-panels, copyright notices etc., surrounding the main content of the web page. Hence, it is useful to mine such data regions and data records in order to extract information from such web pages to provide value-added services. Currently available automatic techniques to mine data regions and data records from web pages are still unsatisfactory because of their poor performance. In this paper a novel method to identify and extract the flat and nested data records from the web pages automatically is proposed. It comprises of two steps : (1) Identification and Extraction of the data regions based on visual clues information. (2) Identificatio...

  15. Validation of a Web Application by Using a Limited Number of Web Pages

    OpenAIRE

    Doru Anastasiu Popescu; Maria Catrinel Dănăuţă

    2012-01-01

    In this paper, we are trying to introduce a method of selection of some web pages from a web application, which will be verified by using different validating mechanisms. The number of selected web pages cannot be higher than a previously established constant. The method of selection of these web pages must assure the highest possible quality of the verification of the entire application. The error detection of these web pages will automatically lead to the error detection in other pages. Thi...

  16. Collective Behaviour Learning :A Concept For Filtering Web Pages

    Directory of Open Access Journals (Sweden)

    G. Mercy Bai

    2014-03-01

    Full Text Available The rapid growth of the WWW poses unprecedented challenges for general purpose crawlers and search engines. The Former technique used to crawl web pages was FOCUS (Forum Crawler Under Supervision.This project presents a collective behavior learning algorithm for web crawling. The collective behavior learning algorithm crawl the web pages based on particular keyword. Discriminative learning extracts only the related URL of the particular keyword based on filtering. The goal of this project is to crawl relevant forum content from the web with minimal overhead. The unwanted URL is removed from the web pages and the web page crawling is reduced by using the collective behavior learning. The web pages must be extracted based on certain learning techniques and can be used to collect the unwanted URL’S.

  17. Migrating Multi-page Web Applications to Single-page AJAX Interfaces

    NARCIS (Netherlands)

    Mesbah, A.; Van Deursen, A.

    2006-01-01

    Recently, a new web development technique for creating interactive web applications, dubbed AJAX, has emerged. In this new model, the single-page web interface is composed of individual components which can be updated/replaced independently. With the rise of AJAX web applications classical multi-pag

  18. An evaluation on the Web page navigation tools in university library Web sites In Turkey

    OpenAIRE

    Çakmak, Tolga

    2010-01-01

    Web technologies and web pages are primary tools for dissemination of information all over the world today. Libraries are also using and adopting these technologies to reach their audiences. The effective usage of these technologies can be possible with user centered design. Web pages that have user centered design help users to find information without being lost in the web page. As a part of the web pages, navigation systems have a vital role in this context. Effective usage of navigation s...

  19. Evaluation of the Importance of Web Pages

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Google's algorithm on PageRank is analyzed in details. Some disadvantages of this algorithm is presented, for instance, preferring old pages, ignoring special sites and inaccurate judge of hyperlinks pointed out from one page. Furthermore, author's improved algorithm is described. Experiments show that the author's consideration on evaluating the importance of pages can make an improvement over the original algorithm. Based on this improved algorithm a topicspecific searching system have been developed.

  20. Recognition of pornographic web pages by classifying texts and images.

    Science.gov (United States)

    Hu, Weiming; Wu, Ou; Chen, Zhouyao; Fu, Zhouyu; Maybank, Steve

    2007-06-01

    With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can be easily accessed. It is important to recognize such unsuitable, offensive, or pornographic Web pages. In this paper, a novel framework for recognizing pornographic Web pages is described. A C4.5 decision tree is used to divide Web pages, according to content representations, into continuous text pages, discrete text pages, and image pages. These three categories of Web pages are handled, respectively, by a continuous text classifier, a discrete text classifier, and an algorithm that fuses the results from the image classifier and the discrete text classifier. In the continuous text classifier, statistical and semantic features are used to recognize pornographic texts. In the discrete text classifier, the naive Bayes rule is used to calculate the probability that a discrete text is pornographic. In the image classifier, the object's contour-based features are extracted to recognize pornographic images. In the text and image fusion algorithm, the Bayes theory is used to combine the recognition results from images and texts. Experimental results demonstrate that the continuous text classifier outperforms the traditional keyword-statistics-based classifier, the contour-based image classifier outperforms the traditional skin-region-based image classifier, the results obtained by our fusion algorithm outperform those by either of the individual classifiers, and our framework can be adapted to different categories of Web pages.

  1. An Analysis of Academic Library Web Pages for Faculty

    Science.gov (United States)

    Gardner, Susan J.; Juricek, John Eric; Xu, F. Grace

    2008-01-01

    Web sites are increasingly used by academic libraries to promote key services and collections to teaching faculty. This study analyzes the content, location, language, and technological features of fifty-four academic library Web pages designed especially for faculty to expose patterns in the development of these pages.

  2. Digital Ethnography: Library Web Page Redesign among Digital Natives

    Science.gov (United States)

    Klare, Diane; Hobbs, Kendall

    2011-01-01

    Presented with an opportunity to improve Wesleyan University's dated library home page, a team of librarians employed ethnographic techniques to explore how its users interacted with Wesleyan's current library home page and web pages in general. Based on the data that emerged, a group of library staff and members of the campus' information…

  3. A Model for Web Page Usage Mining Based on Segmentation

    CERN Document Server

    Kuppusamy, K S

    2012-01-01

    The web page usage mining plays a vital role in enriching the page's content and structure based on the feedbacks received from the user's interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focus from users comparing with others. The segment level analytics of user actions provides an important metric to analyse the factors which facilitate the increase in traffic for the page. The empirical validation of the model is performed through prototype implementation.

  4. The Web Application Test Based on Page Coverage Criteria

    Institute of Scientific and Technical Information of China (English)

    CAI Li-zhi; TONG Wei-qin; YANG Gen-xing

    2008-01-01

    Software testing coverage criteria play an important role in the whole testing process. The current coverage criteria for web applications are based on program or URL. They are not suitable for black-box test or intuitional to use. This paper defines a kind of test criteria based on page coverage sequences only navigated by web application, including Page_Single, Page_Post, Page_Pre,Page_Seq2, Page_SeqK. The test criteria based on page coverage sequences made by interactions between web application and browser are being under consideration after that. In order to avoid ambiguity of natural language, these coverage criteria are depicted using Z formal language. The empirical result shows that the criteria complement traditional coverage and fault detection capability criteria.

  5. Metadata Schema Used in OCLC Sampled Web Pages

    Directory of Open Access Journals (Sweden)

    Fei Yu

    2005-12-01

    Full Text Available The tremendous growth of Web resources has made information organization and retrieval more and more difficult. As one approach to this problem, metadata schemas have been developed to characterize Web resources. However, many questions have been raised about the use of metadata schemas such as which metadata schemas have been used on the Web? How did they describe Web accessible information? What is the distribution of these metadata schemas among Web pages? Do certain schemas dominate the others? To address these issues, this study analyzed 16,383 Web pages with meta tags extracted from 200,000 OCLC sampled Web pages in 2000. It found that only 8.19% Web pages used meta tags; description tags, keyword tags, and Dublin Core tags were the only three schemas used in the Web pages. This article revealed the use of meta tags in terms of their function distribution, syntax characteristics, granularity of the Web pages, and the length distribution and word number distribution of both description and keywords tags.

  6. An efficient scheme for automatic web pages categorization using the support vector machine

    Science.gov (United States)

    Bhalla, Vinod Kumar; Kumar, Neeraj

    2016-07-01

    In the past few years, with an evolution of the Internet and related technologies, the number of the Internet users grows exponentially. These users demand access to relevant web pages from the Internet within fraction of seconds. To achieve this goal, there is a requirement of an efficient categorization of web page contents. Manual categorization of these billions of web pages to achieve high accuracy is a challenging task. Most of the existing techniques reported in the literature are semi-automatic. Using these techniques, higher level of accuracy cannot be achieved. To achieve these goals, this paper proposes an automatic web pages categorization into the domain category. The proposed scheme is based on the identification of specific and relevant features of the web pages. In the proposed scheme, first extraction and evaluation of features are done followed by filtering the feature set for categorization of domain web pages. A feature extraction tool based on the HTML document object model of the web page is developed in the proposed scheme. Feature extraction and weight assignment are based on the collection of domain-specific keyword list developed by considering various domain pages. Moreover, the keyword list is reduced on the basis of ids of keywords in keyword list. Also, stemming of keywords and tag text is done to achieve a higher accuracy. An extensive feature set is generated to develop a robust classification technique. The proposed scheme was evaluated using a machine learning method in combination with feature extraction and statistical analysis using support vector machine kernel as the classification tool. The results obtained confirm the effectiveness of the proposed scheme in terms of its accuracy in different categories of web pages.

  7. Metadata Schema Used in OCLC Sampled Web Pages

    OpenAIRE

    Fei Yu

    2005-01-01

    The tremendous growth of Web resources has made information organization and retrieval more and more difficult. As one approach to this problem, metadata schemas have been developed to characterize Web resources. However, many questions have been raised about the use of metadata schemas such as which metadata schemas have been used on the Web? How did they describe Web accessible information? What is the distribution of these metadata schemas among Web pages? Do certain schemas dominate the o...

  8. Veracity in in vitro fertilization Web pages.

    Science.gov (United States)

    Cowan, Bryan D

    2005-03-01

    Huang et al. described compliance of IVF websites against the American Medical Association online health information guidelines and reported that IVF websites scored poorly. We describe a protocol for IVF websites that would inform readers about truthfulness of the page, develop standards for page construction, and establish a review process.

  9. A teen's guide to creating web pages and blogs

    CERN Document Server

    Selfridge, Peter; Osburn, Jennifer

    2008-01-01

    Whether using a social networking site like MySpace or Facebook or building a Web page from scratch, millions of teens are actively creating a vibrant part of the Internet. This is the definitive teen''s guide to publishing exciting web pages and blogs on the Web. This easy-to-follow guide shows teenagers how to: Create great MySpace and Facebook pages Build their own unique, personalized Web site Share the latest news with exciting blogging ideas Protect themselves online with cyber-safety tips Written by a teenager for other teens, this book leads readers step-by-step through the basics of web and blog design. In this book, teens learn to go beyond clicking through web sites to learning winning strategies for web design and great ideas for writing blogs that attract attention and readership.

  10. A reverse engineering approach for automatic annotation of Web pages

    NARCIS (Netherlands)

    R. de Virgilio (Roberto); F. Frasincar (Flavius); W. Hop (Wim); S. Lachner (Stephan)

    2013-01-01

    textabstractThe Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving, and reusing information. Since Web pages are designed to be read by people, not machines, searching and reusing information on the Web is a difficult task without human participation. To this aim

  11. The 'Don'ts' of Web Page Design.

    Science.gov (United States)

    Balas, Janet L.

    1999-01-01

    Discusses online resources that focus on what not to do in Web page design. "Don'ts" include: making any of the top 10 mistakes identified by Nielsen, qualifying for a "muddie" award for bad Web sites, forgetting to listen to users, and forgetting accessibility. A sidebar lists the Web site addresses for the nine resources…

  12. Finding pages on the unarchived Web

    NARCIS (Netherlands)

    Kamps, J.; Ben-David, A.; Huurdeman, H.C.; Vries, A.P. de; Samar, T.

    2014-01-01

    Web archives preserve the fast changing Web, yet are highly incomplete due to crawling restrictions, crawling depth and frequency, or restrictive selection policies-most of the Web is unarchived and therefore lost to posterity. In this paper, we propose an approach to recover significant parts of th

  13. A Model for Web Page Usage Mining Based on Segmentation

    OpenAIRE

    Kuppusamy, K. S.; Aghila, G.

    2012-01-01

    The web page usage mining plays a vital role in enriching the page's content and structure based on the feedbacks received from the user's interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focus from users comparing with others. The segment level analytics of user actions provides an importan...

  14. Web Pages for Your Classroom The EASY Way!

    CERN Document Server

    Mccorkle, Sandra

    2003-01-01

    A practical how-to guide, this book provides the classroom teacher or librarian with all of the tools necessary for creating Web pages for student use. Useful templates-a CD ROM is included for easy use-and clear, logical instructions guide you in the creation of pages that students can later use for research or other types of projects that familiarize students with the power and usefulness of the Web. Gaining this skill allows you the flexibility of tailoring Web pages to students' specific needs and being sure of the quality of resources students are accessing. This book is indispensable for

  15. Page sample size in web accessibility testing: how many pages is enough?

    NARCIS (Netherlands)

    Velleman, Eric; Geest, van der Thea

    2013-01-01

    Various countries and organizations use a different sampling approach and sample size of web pages in accessibility conformance tests. We are conducting a systematic analysis to determine how many pages is enough for testing whether a website is compliant with standard accessibility guidelines. This

  16. Enriching the trustworthiness of health-related web pages.

    Science.gov (United States)

    Gaudinat, Arnaud; Cruchet, Sarah; Boyer, Celia; Chrawdhry, Pravir

    2011-06-01

    We present an experimental mechanism for enriching web content with quality metadata. This mechanism is based on a simple and well-known initiative in the field of the health-related web, the HONcode. The Resource Description Framework (RDF) format and the Dublin Core Metadata Element Set were used to formalize these metadata. The model of trust proposed is based on a quality model for health-related web pages that has been tested in practice over a period of thirteen years. Our model has been explored in the context of a project to develop a research tool that automatically detects the occurrence of quality criteria in health-related web pages.

  17. Does Aesthetics of Web Page Interface Matters to Mandarin Learning?

    CERN Document Server

    Zain, Jasni Mohamad; Goh, Yingsoon

    2011-01-01

    Aesthetics of web page refers to how attractive a web page is in which it catches the attention of the user to read through the information. In addition, the visual appearance is important in getting attentions of the users. Moreover, it was found that those screens, which were perceived as aesthetically pleasing, were having a better usability. Usability might be a strong basic in relating to the applicability for learning, and in this study pertaining to Mandarin learning. It was also found that aesthetically pleasing layouts of web page would motivate students in Mandarin learning The Mandarin Learning web pages were manipulated according to the desired aesthetic measurements. GUI aesthetic measuring method was used for this purpose. The Aesthetics-Measurement Application (AMA) accomplished with six aesthetic measures was developed and used. On top of it, questionnaires were distributed to the users to gather information on the students' perceptions on the aesthetic aspects and learning aspects. Respondent...

  18. An Improved Approach to perform Crawling and avoid Duplicate Web Pages

    Directory of Open Access Journals (Sweden)

    Dhiraj Khurana

    2012-06-01

    Full Text Available When a web search is performed it includes many duplicate web pages or the websites. It means we can get number of similar pages at different web servers. We are proposing a Web Crawling Approach to Detect and avoid Duplicate or Near Duplicate WebPages. In this proposed work we are presenting a keyword Prioritization based approach to identify the web page over the web. As such pages will beidentified it will optimize the web search.

  19. Digital libraries and World Wide Web sites and page persistence.

    Directory of Open Access Journals (Sweden)

    Wallace Koehler

    1999-01-01

    Full Text Available Web pages and Web sites, some argue, can either be collected as elements of digital or hybrid libraries, or, as others would have it, the WWW is itself a library. We begin with the assumption that Web pages and Web sites can be collected and categorized. The paper explores the proposition that the WWW constitutes a library. We conclude that the Web is not a digital library. However, its component parts can be aggregated and included as parts of digital library collections. These, in turn, can be incorporated into "hybrid libraries." These are libraries with both traditional and digital collections. Material on the Web can be organized and managed. Native documents can be collected in situ, disseminated, distributed, catalogueed, indexed, controlled, in traditional library fashion. The Web therefore is not a library, but material for library collections is selected from the Web. That said, the Web and its component parts are dynamic. Web documents undergo two kinds of change. The first type, the type addressed in this paper, is "persistence" or the existence or disappearance of Web pages and sites, or in a word the lifecycle of Web documents. "Intermittence" is a variant of persistence, and is defined as the disappearance but reappearance of Web documents. At any given time, about five percent of Web pages are intermittent, which is to say they are gone but will return. Over time a Web collection erodes. Based on a 120-week longitudinal study of a sample of Web documents, it appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years. That is to say, an unweeded Web document collection created two years ago would contain the same number of URLs, but only half of those URLs point to content. The second type of change Web documents experience is change in Web page or Web site content. Again based on the Web document samples, very nearly all Web pages and sites undergo some

  20. Evaluating Information Quality: Hidden Biases on the Children's Web Pages

    Science.gov (United States)

    Kurubacak, Gulsun

    2006-01-01

    As global digital communication continues to flourish, the Children's Web pages become more critical for children to realize not only the surface but also breadth and deeper meanings in presenting these milieus. These pages not only are very diverse and complex but also enable intense communication across social, cultural and political…

  1. A personalized web page content filtering model based on segmentation

    CERN Document Server

    Kuppusamy, K S; 10.5121/ijist.2012.2104

    2012-01-01

    In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have content filtering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the increased dynamism in the web pages, it has become a common phenomenon that different portions of the web page holds different types of content at different time instances. This paper proposes a model to block the contents at a fine-grained level i.e. instead of completely blocking the page it would be efficient to block only those segments which holds the contents to be blocked. The advantages of this method over the traditional methods are fine-graining level of blocking and automatic identification of portions of the page to be blocked. The experiments conducted on the proposed model indicate 88% of accuracy in filter...

  2. Web pages of Slovenian public libraries

    Directory of Open Access Journals (Sweden)

    Silva Novljan

    2002-01-01

    Full Text Available Libraries should offer their patrons web sites which establish the unmistakeable concept (public of library, the concept that cannot be mistaken for other information brokers and services available on the Internet, but inside this framework of the concept of library, would show a diversity which directs patrons to other (public libraries. This can be achieved by reliability, quality of information and services, and safety of usage.Achieving this, patrons regard library web sites as important reference sources deserving continuous usage for obtaining relevant information. Libraries excuse investment in the development and sustainance of their web sites by the number of visits and by patron satisfaction. The presented research, made on a sample of Slovene public libraries’web sites, determines how the libraries establish their purpose and role, as well as the given professional recommendations in web site design.The results uncover the striving of libraries for the modernisation of their functions,major attention is directed to the presentation of classic libraries and their activities,lesser to the expansion of available contents and electronic sources. Pointing to their diversity is significant since it is not a result of patrons’ needs, but more the consequence of improvisation, too little attention to selection, availability, organisation and formation of different kind of information and services on the web sites. Based on the analysis of a common concept of the public library web site, certain activities for improving the existing state of affairs are presented in the paper.

  3. A thorough spring-clean for CERN's Web pages

    CERN Multimedia

    2001-01-01

    This coming Tuesday will see the unveiling of CERN's new user pages on the Web. Their simplified layout and design will make everybody's lives a whole lot easier. Stand by for Tuesday 17 April when, as announced in the Weekly Bulletin of 2 April (n°14/2001), the new newly-designed users welcome page will be hitting our screens as the default CERN home page. But don't worry, if you've got the blues for the good old blue-green home page it's still in service and, to ensure a smooth transition, will be maintained in parallel until 25 May. But in all likelihood you'll be quickly won over by the new-look pages, which are so much simpler to use. Welcome to the new Web! The aim of this revamp, led by the WPE (Web Public Education) group, is to simplify and introduce a more logical hierarchy into the menus and welcome pages on CERN's Intranet. In a second stage, the 'General Public' pages will get a similar makeover. The fact is that the number of links on the user pages, and in particular the welcome page...

  4. What Should Be On A School Library Web Page?

    Science.gov (United States)

    Baumbach, Donna; Brewer, Sally; Renfroe, Matt

    2004-01-01

    As varied as the schools and the communities they serve, so too are the Web pages for the library media programs that serve them. This article provides guidelines for effective web design and the information that might be included, including reference resources, reference asistance, curriculum support, literacy advocacy, and dynamic material. An…

  5. A Quantitative Comparison of Semantic Web Page Segmentation Approaches

    NARCIS (Netherlands)

    Kreuzer, Robert; Hage, J.; Feelders, A.J.

    2015-01-01

    We compare three known semantic web page segmentation algorithms, each serving as an example of a particular approach to the problem, and one self-developed algorithm, WebTerrain, that combines two of the approaches. We compare the performance of the four algorithms for a large benchmark of modern w

  6. Web Page Design in Distance Education

    Science.gov (United States)

    Isman, Aytekin; Dabaj, Fahme; Gumus, Agah; Altinay, Fahriye; Altinay, Zehra

    2004-01-01

    Distance education is contemporary process of the education. It facilitates fast, easy delivery of information with its concrete hardware and software tools. The development of high technology, internet and web-design delivering become impact of effective using as delivery system to the students. Within the global perspective, even the all work…

  7. Beginning ASPNET Web Pages with WebMatrix

    CERN Document Server

    Brind, Mike

    2011-01-01

    Learn to build dynamic web sites with Microsoft WebMatrix Microsoft WebMatrix is designed to make developing dynamic ASP.NET web sites much easier. This complete Wrox guide shows you what it is, how it works, and how to get the best from it right away. It covers all the basic foundations and also introduces HTML, CSS, and Ajax using jQuery, giving beginning programmers a firm foundation for building dynamic web sites.Examines how WebMatrix is expected to become the new recommended entry-level tool for developing web sites using ASP.NETArms beginning programmers, students, and educators with al

  8. Web Pages Clustering: A New Approach

    CERN Document Server

    E, Jeevan H; N, Punith Kumar S; Hegde, Vinay

    2011-01-01

    The rapid growth of web has resulted in vast volume of information. Information availability at a rapid speed to the user is vital. English language (or any for that matter) has lot of ambiguity in the usage of words. So there is no guarantee that a keyword based search engine will provide the required results. This paper introduces the use of dictionary (standardised) to obtain the context with which a keyword is used and in turn cluster the results based on this context. These ideas can be merged with a metasearch engine to enhance the search efficiency.

  9. Treelicious: a System for Semantically Navigating Tagged Web Pages

    CERN Document Server

    Mullins, Matt; 10.1109/WI-IAT.2010.289

    2011-01-01

    Collaborative tagging has emerged as a popular and effective method for organizing and describing pages on the Web. We present Treelicious, a system that allows hierarchical navigation of tagged web pages. Our system enriches the navigational capabilities of standard tagging systems, which typically exploit only popularity and co-occurrence data. We describe a prototype that leverages the Wikipedia category structure to allow a user to semantically navigate pages from the Delicious social bookmarking service. In our system a user can perform an ordinary keyword search and browse relevant pages but is also given the ability to broaden the search to more general topics and narrow it to more specific topics. We show that Treelicious indeed provides an intuitive framework that allows for improved and effective discovery of knowledge.

  10. A Novel Approach for Web Page Set Mining

    CERN Document Server

    Geeta, R B; Totad, Shasikumar G; D, Prasad Reddy P V G

    2011-01-01

    The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL), the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine) provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional databa...

  11. Relevant Pages in semantic Web Search Engines using Ontology

    Directory of Open Access Journals (Sweden)

    Jemimah Simon

    2012-03-01

    Full Text Available In general, search engines are the most popular means of searching any kind of information from the Internet. Generally, keywords are given to the search engine and the Web database returns the documents containing specified keywords. In many situations, irrelevant results are given as results to the user query since different keywords are used in different forms in various documents. The development of the next generation Web, Semantic Web, will change this situation. This paper proposes a prototype of relation-based search engine which ranks the page according to the user query and on annotated results. Page sub graph is computed for each annotated page in the result set by generating all possible combinations for the relation in the sub graph. A relevance score is computed for each annotated page using a probability measure. A relation based ranking model is used which displays the pages in the final result set according to their relevance score. This ranking is provided by considering keyword-concept associations. Thus, the final result set contains pages in the order of their constrained relevant scores.

  12. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIs. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. At the outset, I was given goals and expectations from my branch and from my mentor with regards to the further implementation of GIs. Those goals are as follows: (1) Continue the development of GIS for the underground structures. (2) Extract and export annotated data from AutoCAD drawing files and construct a database (to serve as a prototype for future work). (3) Examine existing underground record drawings to determine existing and non-existing underground tanks. Once this data was collected and analyzed, I set out on the task of creating a user-friendly database that could be assessed by all members of the branch. It was important that the database be built using programs that most employees already possess, ruling out most AutoCAD-based viewers. Therefore, I set out to create an Access database that translated onto the web using Internet

  13. Building interactive simulations in a Web page design program.

    Science.gov (United States)

    Kootsey, J Mailen; Siriphongs, Daniel; McAuley, Grant

    2004-01-01

    A new Web software architecture, NumberLinX (NLX), has been integrated into a commercial Web design program to produce a drag-and-drop environment for building interactive simulations. NLX is a library of reusable objects written in Java, including input, output, calculation, and control objects. The NLX objects were added to the palette of available objects in the Web design program to be selected and dropped on a page. Inserting an object in a Web page is accomplished by adding a template block of HTML code to the page file. HTML parameters in the block must be set to user-supplied values, so the HTML code is generated dynamically, based on user entries in a popup form. Implementing the object inspector for each object permits the user to edit object attributes in a form window. Except for model definition, the combination of the NLX architecture and the Web design program permits construction of interactive simulation pages without writing or inspecting code.

  14. Network and User-Perceived Performance of Web Page Retrievals

    Science.gov (United States)

    Kruse, Hans; Allman, Mark; Mallasch, Paul

    1998-01-01

    The development of the HTTP protocol has been driven by the need to improve the network performance of the protocol by allowing the efficient retrieval of multiple parts of a web page without the need for multiple simultaneous TCP connections between a client and a server. We suggest that the retrieval of multiple page elements sequentially over a single TCP connection may result in a degradation of the perceived performance experienced by the user. We attempt to quantify this perceived degradation through the use of a model which combines a web retrieval simulation and an analytical model of TCP operation. Starting with the current HTTP/l.1 specification, we first suggest a client@side heuristic to improve the perceived transfer performance. We show that the perceived speed of the page retrieval can be increased without sacrificing data transfer efficiency. We then propose a new client/server extension to the HTTP/l.1 protocol to allow for the interleaving of page element retrievals. We finally address the issue of the display of advertisements on web pages, and in particular suggest a number of mechanisms which can make efficient use of IP multicast to send advertisements to a number of clients within the same network.

  15. Relating Web pages to enable information-gathering tasks

    CERN Document Server

    Bagchi, Amitabha

    2008-01-01

    We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be productively mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: {\\em SeekRel}, {\\em FactRel} and {\\em SurfRel}. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks - each corresponding to a particular keyword - that mirror the interconnection structure of the World Wide Web. The scores are computed by computing flows on ...

  16. What Snippets Say About Pages in Federated Web Search

    NARCIS (Netherlands)

    Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Dolf; Develder, Chris; Hiemstra, Djoerd; Hou, Yuexian; Nie, Jian-Yun; Sun, Le; Wang, Bo; Zhang, Peng

    2012-01-01

    What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research qu

  17. Automatic Caption Localization for Photographs on World Wide Web Pages.

    Science.gov (United States)

    Rowe, Neil C.; Frew, Brian

    1998-01-01

    Explores the indirect method of locating for indexing the likely explicit and implicit captions of photographs, using multimodal clues including the specific words used, syntax, surrounding layout of the Web page, and general appearance of the associated image. The MARIE-3 system thus avoids full image processing and full natural-language…

  18. RDFa Primer, Embedding Structured Data in Web Pages

    NARCIS (Netherlands)

    W3C, institution; Birbeck, M.; et al, not CWI

    2007-01-01

    Current Web pages, written in XHTML, contain inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When authors and publishers can express this data precisely, and when tools can read it robustly, a new world of user functi

  19. Evaluating the usability of web pages: a case study

    NARCIS (Netherlands)

    Lautenbach, M.A.E.; Schegget, I.E. ter; Schoute, A.E.; Witteman, C.L.M.

    2008-01-01

    An evaluation of the Utrecht University website was carried out with 240 students. New criteria were drawn from the literature and operationalized for the study. These criteria are surveyability and findability. Web pages can be said to satisfy a usability criterion if their efficiency and effective

  20. Business Systems Branch Abilities, Capabilities, and Services Web Page

    Science.gov (United States)

    Cortes-Pena, Aida Yoguely

    2009-01-01

    During the INSPIRE summer internship I acted as the Business Systems Branch Capability Owner for the Kennedy Web-based Initiative for Communicating Capabilities System (KWICC), with the responsibility of creating a portal that describes the services provided by this Branch. This project will help others achieve a clear view ofthe services that the Business System Branch provides to NASA and the Kennedy Space Center. After collecting the data through the interviews with subject matter experts and the literature in Business World and other web sites I identified discrepancies, made the necessary corrections to the sites and placed the information from the report into the KWICC web page.

  1. Standards opportunities around data-bearing Web pages.

    Science.gov (United States)

    Karger, David

    2013-03-28

    The evolving Web has seen ever-growing use of structured data, thanks to the way it enhances information authoring, querying, visualization and sharing. To date, however, most structured data authoring and management tools have been oriented towards programmers and Web developers. End users have been left behind, unable to leverage structured data for information management and communication as well as professionals. In this paper, I will argue that many of the benefits of structured data management can be provided to end users as well. I will describe an approach and tools that allow end users to define their own schemas (without knowing what a schema is), manage data and author (not program) interactive Web visualizations of that data using the Web tools with which they are already familiar, such as plain Web pages, blogs, wikis and WYSIWYG document editors. I will describe our experience deploying these tools and some lessons relevant to their future evolution.

  2. Text Categorization Based on K-Nearest Neighbor Approach for Web Site Classification.

    Science.gov (United States)

    Kwon, Oh-Woog; Lee, Jong-Hyeok

    2003-01-01

    Discusses text categorization and Web site classification and proposes a three-step classification system that includes the use of Web pages linked with the home page. Highlights include the k-nearest neighbor (k-NN) approach; improving performance with a feature selection method and a term weighting scheme using HTML tags; and similarity…

  3. Problems of long-term preservation of web pages

    Directory of Open Access Journals (Sweden)

    Mitja Dečman

    2011-01-01

    Full Text Available The World Wide Web is a distributed collection of web sites available on the Internet anywhere in the world. Its content is constantly changing: old data are being replaced which causes constant loss of a huge amount of information and consequently the loss of scientific, cultural and other heritage. Often, unnoticeably even legal certainty is questioned. In what way the data on the web can be stored and how to preserve them for the long term is a great challenge. Even though some good practices have been developed, the question of final solution on the national level still remains. The paper presents the problems of long-term preservation of web pages from technical and organizational point of view. It includes phases such as capturing and preserving web pages, focusing on good solutions, world practices and strategies to find solutions in this area developed by different countries. The paper suggests some conceptual steps that have to be defined in Slovenia which would serve as a framework for all document creators in the web environment and therefore contributes to the consciousness in this field, mitigating problems of all dealing with these issues today and in the future.

  4. Lifting Events in RDF from Interactions with Annotated Web Pages

    Science.gov (United States)

    Stühmer, Roland; Anicic, Darko; Sen, Sinan; Ma, Jun; Schmidt, Kay-Uwe; Stojanovic, Nenad

    In this paper we present a method and an implementation for creating and processing semantic events from interaction with Web pages which opens possibilities to build event-driven applications for the (Semantic) Web. Events, simple or complex, are models for things that happen e.g., when a user interacts with a Web page. Events are consumed in some meaningful way e.g., for monitoring reasons or to trigger actions such as responses. In order for receiving parties to understand events e.g., comprehend what has led to an event, we propose a general event schema using RDFS. In this schema we cover the composition of complex events and event-to-event relationships. These events can then be used to route semantic information about an occurrence to different recipients helping in making the Semantic Web active. Additionally, we present an architecture for detecting and composing events in Web clients. For the contents of events we show a way of how they are enriched with semantic information about the context in which they occurred. The paper is presented in conjunction with the use case of Semantic Advertising, which extends traditional clickstream analysis by introducing semantic short-term profiling, enabling discovery of the current interest of a Web user and therefore supporting advertisement providers in responding with more relevant advertisements.

  5. Web Page Change and Persistence-A Four-Year Longitudinal Study.

    Science.gov (United States)

    Koehler, Wallace

    2002-01-01

    Discussion of changes in the topography of the Web focuses on changes to an existing set of Web documents over a four-year period. Highlights include the life cycle of Web objects; changes to Web objects; measures of change; Web page demise; and Web page changes, including hypertext links, content change, and structural change. (LRW)

  6. Identify Web-page Content meaning using Knowledge based System for Dual Meaning Words

    OpenAIRE

    Sinha, Sukanta; Dattagupta, Rana; Mukhopadhyay, Debajyoti

    2012-01-01

    Meaning of Web-page content plays a big role while produced a search result from a search engine. Most of the cases Web-page meaning stored in title or meta-tag area but those meanings do not always match with Web-page content. To overcome this situation we need to go through the Web-page content to identify the Web-page meaning. In such cases, where Webpage content holds dual meaning words that time it is really difficult to identify the meaning of the Web-page. In this paper, we are introdu...

  7. Children's recognition of advertisements on television and on Web pages.

    Science.gov (United States)

    Blades, Mark; Oates, Caroline; Li, Shiying

    2013-03-01

    In this paper we consider the issue of advertising to children. Advertising to children raises a number of concerns, in particular the effects of food advertising on children's eating habits. We point out that virtually all the research into children's understanding of advertising has focused on traditional television advertisements, but much marketing aimed at children is now via the Internet and little is known about children's awareness of advertising on the Web. One important component of understanding advertisements is the ability to distinguish advertisements from other messages, and we suggest that young children's ability to recognise advertisements on a Web page is far behind their ability to recognise advertisements on television.

  8. Building Interactive Simulations in Web Pages without Programming.

    Science.gov (United States)

    Mailen Kootsey, J; McAuley, Grant; Bernal, Julie

    2005-01-01

    A software system is described for building interactive simulations and other numerical calculations in Web pages. The system is based on a new Java-based software architecture named NumberLinX (NLX) that isolates each function required to build the simulation so that a library of reusable objects could be assembled. The NLX objects are integrated into a commercial Web design program for coding-free page construction. The model description is entered through a wizard-like utility program that also functions as a model editor. The complete system permits very rapid construction of interactive simulations without coding. A wide range of applications are possible with the system beyond interactive calculations, including remote data collection and processing and collaboration over a network.

  9. Program for Culture and Conflict Studies, web page capture

    OpenAIRE

    Naval Postgraduate School (U.S.)

    2014-01-01

    web page capture from the NPS website The Program for Culture and Conflict Studies (CCS) is premised on the belief that the United States must understand the cultures and societies of the world to effectively interact with local people. It is dedicated to the study of anthropological, ethnographic, social, political, and economic data to inform U.S. policies at both the strategic and operational levels.

  10. Web entity extraction based on entity attribute classification

    Science.gov (United States)

    Li, Chuan-Xi; Chen, Peng; Wang, Ru-Jing; Su, Ya-Ru

    2011-12-01

    The large amount of entity data are continuously published on web pages. Extracting these entities automatically for further application is very significant. Rule-based entity extraction method yields promising result, however, it is labor-intensive and hard to be scalable. The paper proposes a web entity extraction method based on entity attribute classification, which can avoid manual annotation of samples. First, web pages are segmented into different blocks by algorithm Vision-based Page Segmentation (VIPS), and a binary classifier LibSVM is trained to retrieve the candidate blocks which contain the entity contents. Second, the candidate blocks are partitioned into candidate items, and the classifiers using LibSVM are performed for the attributes annotation of the items and then the annotation results are aggregated into an entity. Results show that the proposed method performs well to extract agricultural supply and demand entities from web pages.

  11. A Novel Approach for Web Page Set Mining

    Directory of Open Access Journals (Sweden)

    R.B.Geeta

    2011-11-01

    Full Text Available The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL, the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional database.

  12. Clustering of Deep WebPages: A Comparative Study

    Directory of Open Access Journals (Sweden)

    Muhunthaadithya C

    2015-10-01

    Full Text Available The internethas massive amount of information. This information is stored in the form of zillions of webpages. The information that can be retrieved by search engines is huge, and this information constitutes the ‘surface web’.But the remaining information, which is not indexed by search engines – the ‘deep web’, is much bigger in size than the ‘surface web’, and remains unexploited yet. Several machine learning techniques have been commonly employed to access deep web content. Under machine learning, topic models provide a simple way to analyze large volumes of unlabeled text. A ‘topic’is a cluster of words that frequently occur together and topic models can connect words with similar meanings and distinguish between words with multiple meanings. In this paper, we cluster deep web databases employing several methods, and then perform a comparative study. In the first method, we apply Latent Semantic Analysis (LSA over the dataset. In the second method, we use a generative probabilistic model called Latent Dirichlet Allocation(LDA for modeling content representative of deep web databases.Both these techniques are implemented after preprocessing the set of web pages to extract page contents and form contents.Further, we propose another version of Latent Dirichlet Allocation (LDA to the dataset. Experimental results show that the proposed method outperforms the existing clustering methods.

  13. Distributed Collections of Web Pages in the Wild

    CERN Document Server

    Bogen, Paul Logasa; Furuta, Richard

    2011-01-01

    As the Distributed Collection Manager's work on building tools to support users maintaining collections of changing web-based resources has progressed, questions about the characteristics of people's collections of web pages have arisen. Simultaneously, work in the areas of social bookmarking, social news, and subscription-based technologies have been taking the existence, usage, and utility of this data for granted with neither investigation into what people are doing with their collections nor how they are trying to maintain them. In order to address these concerns, we performed an online user study of 125 individuals from a variety of online and offline communities, such as the reddit social news user community and the graduate student body in our department. From this study we were able to examine a user's needs for a system to manage their web-based distributed collections, how their current tools affect their ability to maintain their collections, and what the characteristics of their current practices ...

  14. Credibility judgments in web page design - a brief review.

    Science.gov (United States)

    Selejan, O; Muresanu, D F; Popa, L; Muresanu-Oloeriu, I; Iudean, D; Buzoianu, A; Suciu, S

    2016-01-01

    Today, more than ever, knowledge that interfaces appearance analysis is a crucial point in human-computer interaction field has been accepted. As nowadays virtually anyone can publish information on the web, the credibility role has grown increasingly important in relation to the web-based content. Areas like trust, credibility, and behavior, doubled by overall impression and user expectation are today in the spotlight of research compared to the last period, when other pragmatic areas such as usability and utility were considered. Credibility has been discussed as a theoretical construct in the field of communication in the past decades and revealed that people tend to evaluate the credibility of communication primarily by the communicator's expertise. Other factors involved in the content communication process are trustworthiness and dynamism as well as various other criteria but to a lower extent. In this brief review, factors like web page aesthetics, browsing experiences and user experience are considered.

  15. A Sorting Method of Meta-search Based on User Web Page Interactive Model

    Institute of Scientific and Technical Information of China (English)

    Zongli Jiang; Tengyu Zhang

    2012-01-01

    Nowadays, there is a problem in most meta-search engines that many web pages searched have nothing to do with users' expectations. We introduce a new user web page interactive model under the framework ofmeta search, which analyzes users' action to get users' interests and storages them, and update these information with users' feedback. Meanwhile this model analyzes user records stored in web, attaches labels to the web page with statistics of user interest. We calculate the similarity about user and web page with the information from model and add similarity to scores of web pages. The experimental results reveal that this method can improve the relevance of the information retrieval.

  16. Going, going, still there: using the WebCite service to permanently archive cited web pages.

    Science.gov (United States)

    Eysenbach, Gunther; Trudel, Mathieu

    2005-12-30

    Scholars are increasingly citing electronic "web references" which are not preserved in libraries or full text archives. WebCite is a new standard for citing web references. To "webcite" a document involves archiving the cited Web page through www.webcitation.org and citing the WebCite permalink instead of (or in addition to) the unstable live Web page. This journal has amended its "instructions for authors" accordingly, asking authors to archive cited Web pages before submitting a manuscript. Almost 200 other journals are already using the system. We discuss the rationale for WebCite, its technology, and how scholars, editors, and publishers can benefit from the service. Citing scholars initiate an archiving process of all cited Web references, ideally before they submit a manuscript. Authors of online documents and websites which are expected to be cited by others can ensure that their work is permanently available by creating an archived copy using WebCite and providing the citation information including the WebCite link on their Web document(s). Editors should ask their authors to cache all cited Web addresses (Uniform Resource Locators, or URLs) "prospectively" before submitting their manuscripts to their journal. Editors and publishers should also instruct their copyeditors to cache cited Web material if the author has not done so already. Finally, WebCite can process publisher submitted "citing articles" (submitted for example as eXtensible Markup Language [XML] documents) to automatically archive all cited Web pages shortly before or on publication. Finally, WebCite can act as a focussed crawler, caching retrospectively references of already published articles. Copyright issues are addressed by honouring respective Internet standards (robot exclusion files, no-cache and no-archive tags). Long-term preservation is ensured by agreements with libraries and digital preservation organizations. The resulting WebCite Index may also have applications for research

  17. HTML Tags as Extraction Cues for Web Page Description Construction

    Directory of Open Access Journals (Sweden)

    Timothy C. Craven

    2003-01-01

    Full Text Available Using four previously identified samples of Web pages containing meta-tagged descriptions, the value of meta-tagged keywords, the first 200 characters of the body, and text marked with common HTML tags as extracts helpful for writing summaries was estimated by applying two measures: density of description words and density of two-word description phrases. Generally, titles and keywords showed the highest densities. Parts of the body showed densities not much different from the body as a whole: somewhat higher for the first 200 characters and for text tagged with "center" and "font"; somewhat lower for text tagged with "a"; not significantly different for "table" and "div". Evidence of non-random clumping of description words in the body of some pages nevertheless suggests that further pursuit of automatic passage extraction methods from the body may be worthwhile. Implications of the findings for aids to summarization, and specifically the TexNet32 package, are discussed.

  18. Overhaul of CERN's top-level web pages

    CERN Multimedia

    2004-01-01

    The pages for CERN users and for the general public have been given a face-lift before they become operational on the central web servers later this month. You may already now inspect the new versions in their "waiting places" at: http://intranet.cern.ch/User/ and http://intranet.cern.ch/Public/ We hope you will like these improved versions and you can report errors and omissions in the usual way ("comments and change requests" link at the bottom of the pages). The new versions will replace the existing ones at the end of the month, so you do not need to change your bookmarks or start-up URL. ETT/EC/EX

  19. The Emerging Infections Network electronic mail conference and web page.

    Science.gov (United States)

    Strausbaugh, L J; Liedtke, L A

    2001-01-15

    In February 1997, the Emerging Infections Network (EIN) established an electronic mail conference to facilitate discussions about emerging infectious diseases and related topics among its members and public health officials. Later that year, the EIN opened its section of the Infectious Diseases Society of America's home page. The EIN Web page was developed to give its members an alternative route for responding to EIN surveys and to facilitate rapid dispersal of EIN reports. The unrestricted portion of the site allows visitors access to information about the EIN and to published EIN reports on specific topics. For the most part, these are brief summaries or abstracts. In the restricted, password-protected portion of the EIN site, members can access the detailed, original reports from EIN queries and the comprehensive listings of member observations. Search functions in both portions of the EIN site enhance the retrieval of reports and observations on specific topics.

  20. WEB LOG EXPLORER – CONTROL OF MULTIDIMENSIONAL DYNAMICS OF WEB PAGES

    Directory of Open Access Journals (Sweden)

    Mislav Šimunić

    2012-07-01

    Full Text Available Demand markets dictate and pose increasingly more requirements to the supplymarket that are not easily satisfied. The supply market presenting its web pages to thedemand market should find the best and quickest ways to respond promptly to the changesdictated by the demand market. The question is how to do that in the most efficient andquickest way. The data on the usage of web pages on a specific web site are recorded in alog file. The data in a log file are stochastic and unordered and require systematicmonitoring, categorization, analyses, and weighing. From the data processed in this way, itis necessary to single out and sort the data by their importance that would be a basis for acontinuous generation of dynamics/changes to the web site pages in line with the criterionchosen. To perform those tasks successfully, a new software solution is required. For thatpurpose, the authors have developed the first version of the WLE (WebLogExplorersoftware solution, which is actually realization of web page multidimensionality and theweb site as a whole. The WebLogExplorer enables statistical and semantic analysis of a logfile and on the basis thereof, multidimensional control of the web page dynamics. Theexperimental part of the work was done within the web site of HTZ (Croatian NationalTourist Board being the main portal of the global tourist supply in the Republic of Croatia(on average, daily "log" consists of c. 600,000 sets, average size of log file is 127 Mb, andc. 7000-8000 daily visitors on the web site.

  1. THE NEW PURCHASING SERVICE PAGE NOW ON THE WEB!

    CERN Multimedia

    SPL Division

    2000-01-01

    Users of CERN's Purchasing Service are encouraged to visit the new Purchasing Service web page, accessible from the CERN homepage or directly at: http://spl-purchasing.web.cern.ch/spl-purchasing/ There, you will find answers to questions such as: Who are the buyers? What do I need to know before creating a DAI? How many offers do I need? Where shall I send the offer I received? I know the amount of my future requirement, how do I proceed? How are contracts adjudicated at CERN? Which exhibitions and visits of Member State companies are foreseen in the future? A company I know is interested in making a presentation at CERN, who should they contact? Additionally, you will find information concerning: The Purchasing procedures Market Surveys and Invitations to Tender The Industrial Liaison Officers appointed in each Member State The Purchasing Broker at CERN

  2. Collecting responses through Web page drag and drop.

    Science.gov (United States)

    Britt, M Anne; Gabrys, Gareth

    2004-02-01

    This article describes how to collect responses from experimental participants using drag and drop on a Web page. In particular, we describe how drag and drop can be used in a text search task in which participants read a text and then locate and categorize certain elements of the text (e.g., to identify the main claim of a persuasive paragraph). Using this technique, participants respond by clicking on a text segment and dragging it to a screen field or icon. We have successfully used this technique in both the argument element identification experiment that we describe here and a tutoring system that we created to teach students to identify source characteristics while reading historical texts (Britt, Perfetti, Van Dyke, & Gabrys, 2000). The implementation described here exploits the capability of recent versions of Microsoft's Internet Explorer Web browser to handle embedded XML documents and drag and drop events.

  3. Learning Hierarchical User Interest Models from Web Pages

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    We propose an algorithm for learning hierarchical user interest models according to the Web pages users have browsed. In this algorithm, the interests of a user are represented into a tree which is called a user interest tree, the content and the structure of which can change simultaneously to adapt to the changes in a user's interests. This expression represents a user's specific and general interests as a continuum. In some sense, specific interests correspond to short-term interests, while general interests correspond to long-term interests. So this representation more really reflects the users' interests. The algorithm can automatically model a user's multiple interest domains, dynamically generate the interest models and prune a user interest tree when the number of the nodes in it exceeds given value. Finally, we show the experiment results in a Chinese Web Site.

  4. The Technology of Extracting Content Information from Web Page Based on DOM Tree

    Science.gov (United States)

    Yuan, Dingrong; Mo, Zhuoying; Xie, Bing; Xie, Yangcai

    There are huge amounts of information on Web pages, which includes content information and other useless information, such as navigation, advertisement and flash of animation etc. Reducing the toils of Web users, we estabished a thechnique to extract the content information from web page. Fristly, we analyzed the semantic of web documents by V8 engine of Google and parsed the web document into DOM tree. And then, traversed the DOM tree, pruned the DOM tree in the light of the characteristic of Web page's edit language. Finally, we extracted the content information from Web page. Theoretics and experiments showed that the technique could simplify the web page, present the content information to web users and supply clean data for applicable area, such as retrieval, KDD and DM from web.

  5. Young children's ability to recognize advertisements in web page designs.

    Science.gov (United States)

    Ali, Moondore; Blades, Mark; Oates, Caroline; Blumberg, Fran

    2009-03-01

    Identifying what is, and what is not an advertisement is the first step in realizing that an advertisement is a marketing message. Children can distinguish television advertisements from programmes by about 5 years of age. Although previous researchers have investigated television advertising, little attention has been given to advertisements in other media, even though other media, especially the Internet, have become important channels of marketing to children. We showed children printed copies of invented web pages that included advertisements, half of which had price information, and asked the children to point to whatever they thought was an advertisement. In two experiments we tested a total of 401 children, aged 6, 8, 10 and 12 years of age, from the United Kingdom and Indonesia. Six-year-olds recognized a quarter of the advertisements, 8-year-olds recognized half the advertisements, and the 10- and 12-year-olds recognized about three-quarters. Only the 10- and 12-year-olds were more likely to identify an advertisement when it included a price. We contrast our findings with previous results about the identification of television advertising, and discuss why children were poorer at recognizing web page advertisements. The performance of the children has implications for theories about how children develop an understanding of advertising.

  6. Appraisals of Salient Visual Elements in Web Page Design

    Directory of Open Access Journals (Sweden)

    Johanna M. Silvennoinen

    2016-01-01

    Full Text Available Visual elements in user interfaces elicit emotions in users and are, therefore, essential to users interacting with different software. Although there is research on the relationship between emotional experience and visual user interface design, the focus has been on the overall visual impression and not on visual elements. Additionally, often in a software development process, programming and general usability guidelines are considered as the most important parts of the process. Therefore, knowledge of programmers’ appraisals of visual elements can be utilized to understand the web page designs we interact with. In this study, appraisal theory of emotion is utilized to elaborate the relationship of emotional experience and visual elements from programmers’ perspective. Participants (N=50 used 3E-templates to express their visual and emotional experiences of web page designs. Content analysis of textual data illustrates how emotional experiences are elicited by salient visual elements. Eight hierarchical visual element categories were found and connected to various emotions, such as frustration, boredom, and calmness, via relational emotion themes. The emotional emphasis was on centered, symmetrical, and balanced composition, which was experienced as pleasant and calming. The results benefit user-centered visual interface design and researchers of visual aesthetics in human-computer interaction.

  7. Cluster Analysis of Customer Reviews Extracted from Web Pages

    Directory of Open Access Journals (Sweden)

    S. Shivashankar

    2010-01-01

    Full Text Available As e-commerce is gaining popularity day by day, the web has become an excellent source for gathering customer reviews / opinions by the market researchers. The number of customer reviews that a product receives is growing at very fast rate (It could be in hundreds or thousands. Customer reviews posted on the websites vary greatly in quality. The potential customer has to read necessarily all the reviews irrespective of their quality to make a decision on whether to purchase the product or not. In this paper, we make an attempt to assess are view based on its quality, to help the customer make a proper buying decision. The quality of customer review is assessed as most significant, more significant, significant and insignificant.A novel and effective web mining technique is proposed for assessing a customer review of a particular product based on the feature clustering techniques, namely, k-means method and fuzzy c-means method. This is performed in three steps : (1Identify review regions and extract reviews from it, (2 Extract and cluster the features of reviews by a clustering technique and then assign weights to the features belonging to each of the clusters (groups and (3 Assess the review by considering the feature weights and group belongingness. The k-means and fuzzy c-means clustering techniques are implemented and tested on customer reviews extracted from web pages. Performance of these techniques are analyzed.

  8. Exploring Cultural Variation in Eye Movements on a Web Page between Americans and Koreans

    Science.gov (United States)

    Yang, Changwoo

    2009-01-01

    This study explored differences in eye movement on a Web page between members of two different cultures to provide insight and guidelines for implementation of global Web site development. More specifically, the research examines whether differences of eye movement exist between the two cultures (American vs. Korean) when viewing a Web page, and…

  9. Socorro Students Translate NRAO Web Pages Into Spanish

    Science.gov (United States)

    2002-07-01

    Six Socorro High School students are spending their summer working at the National Radio Astronomy Observatory (NRAO) on a unique project that gives them experience in language translation, World Wide Web design, and technical communication. Under the project, called "Un puente a los cielos," the students are translating many of NRAO's Web pages on astronomy into Spanish. "These students are using their bilingual skills to help us make basic information about astronomy and radio telescopes available to the Spanish-speaking community," said Kristy Dyer, who works at NRAO as a National Science Foundation postdoctoral fellow and who developed the project and obtained funding for it from the National Aeronautics and Space Administration. The students are: Daniel Acosta, 16; Rossellys Amarante, 15; Sandra Cano, 16; Joel Gonzalez, 16; Angelica Hernandez, 16; and Cecilia Lopez, 16. The translation project, a joint effort of NRAO and the NM Tech physics department, also includes Zammaya Moreno, a teacher from Ecuador, Robyn Harrison, NRAO's education officer, and NRAO computer specialist Allan Poindexter. The students are translating NRAO Web pages aimed at the general public. These pages cover the basics of radio astronomy and frequently-asked questions about NRAO and the scientific research done with NRAO's telescopes. "Writing about science for non-technical audiences has to be done carefully. Scientific concepts must be presented in terms that are understandable to non-scientists but also that remain scientifically accurate," Dyer said. "When translating this type of writing from one language to another, we need to preserve both the understandability and the accuracy," she added. For that reason, Dyer recruited 14 Spanish-speaking astronomers from Argentina, Mexico and the U.S. to help verify the scientific accuracy of the Spanish translations. The astronomers will review the translations. The project is giving the students a broad range of experience. "They are

  10. Semantic Web Techniques for Yellow Page Service Providers

    Directory of Open Access Journals (Sweden)

    Raghu Anantharangachar

    2012-08-01

    Full Text Available Applications providing “yellow pages information” for use over the web should ideally be based on structured information. Use of web pages providing unstructured information poses variety of problems to the user, such as use of arbitrary formats, unsuitability for machine processing and likely incompleteness of information. Structured data alleviates these problems but we require more. Capturing the semantics of a domain in the form of an ontology is necessary to ensure that unforeseen application can easily be created at a later date. Very often yellow page systems are implemented using a centralized database. In some cases, human intermediaries accessible over the phone network examine a centralized database and use their reasoning ability to deal with the user’s need for information. Centralized operation and considerable central administration make these systems expensive to operate. Scaling up such systems is difficult. They behave like isolated systems and it is common for such systems to be highly domain specific, for instance systems dealing with accommodation and travel. This paper explores an alternative – a highly distributed system design meeting a variety of needs – considerably reducing efforts required at a central organization, enabling large numbers of vendors to enter information about their own products and services, enablingend-users to contribute information such as their own ratings, using an ontology to describe each domain of application in a flexible manner for uses foreseen and unforeseen, enabling distributed search and mashups, use of vendor independent standards, using reasoning to find the best matches to a given query, geospatial reasoning and a simple, interactive, mobile application/interface. We view this design as one in which vendors and end-users do the bulk of the work in building large distributed collections of information in a Web 2.0 style. We give importance to geo-spatial information and

  11. CaSePer: An efficient model for personalized web page change detection based on segmentation

    OpenAIRE

    Kuppusamy, K. S.; Aghila, G.

    2014-01-01

    Users who visit a web page repeatedly at frequent intervals are more interested in knowing the recent changes that have occurred on the page than the entire contents of the web page. Because of the increased dynamism of web pages, it would be difficult for the user to identify the changes manually. This paper proposes an enhanced model for detecting changes in the pages, which is called CaSePer (Change detection based on Segmentation with Personalization). The change detection is micro-manage...

  12. Web Page Segmentation for Small Screen Devices Using Tag Path Clustering Approach

    Directory of Open Access Journals (Sweden)

    Ms. S.Aruljothi

    2013-07-01

    Full Text Available The web pages breathing these days are developed to be displayed on a Desktop PCs and so viewing them on mobile web browsers is extremely tough. Since mobile devices have restricted resources, small screen device users need to scroll down and across the complicated sites persistently. To address the problem of resource limitation of small screen devices, a unique methodology of web page segmentation with tag path clustering is proposed, that reduces the memory space demand of the small hand-held devices. For segmenting web pages, both reappearance key patterns detection technique and page layout information are used to provide better segmentation accuracy.

  13. Educational use of World Wide Web pages on CD-ROM.

    Science.gov (United States)

    Engel, Thomas P; Smith, Michael

    2002-01-01

    The World Wide Web is increasingly important for medical education. Internet served pages may also be used on a local hard disk or CD-ROM without a network or server. This allows authors to reuse existing content and provide access to users without a network connection. CD-ROM offers several advantages over network delivery of Web pages for several applications. However, creating Web pages for CD-ROM requires careful planning. Issues include file names, relative links, directory names, default pages, server created content, image maps, other file types and embedded programming. With care, it is possible to create server based pages that can be copied directly to CD-ROM. In addition, Web pages on CD-ROM may reference Internet served pages to provide the best features of both methods.

  14. The impact of visual layout factors on performance in Web pages: a cross-language study.

    Science.gov (United States)

    Parush, Avi; Shwarts, Yonit; Shtub, Avy; Chandra, M Jeya

    2005-01-01

    Visual layout has a strong impact on performance and is a critical factor in the design of graphical user interfaces (GUIs) and Web pages. Many design guidelines employed in Web page design were inherited from human performance literature and GUI design studies and practices. However, few studies have investigated the more specific patterns of performance with Web pages that may reflect some differences between Web page and GUI design. We investigated interactions among four visual layout factors in Web page design (quantity of links, alignment, grouping indications, and density) in two experiments: one with pages in Hebrew, entailing right-to-left reading, and the other with English pages, entailing left-to-right reading. Some performance patterns (measured by search times and eye movements) were similar between languages. Performance was particularly poor in pages with many links and variable densities, but it improved with the presence of uniform density. Alignment was not shown to be a performance-enhancing factor. The findings are discussed in terms of the similarities and differences in the impact of layout factors between GUIs and Web pages. Actual or potential applications of this research include specific guidelines for Web page design.

  15. Extraction of Informative Blocks from Deep Web Page Using Similar Layout Feature

    OpenAIRE

    Zeng,Jun; Flanagan, Brendan; Hirokawa, Sachio

    2013-01-01

    Due to the explosive growth and popularity of the deep web, information extraction from deep web page has gained more and more attention. However, the HTML structure of web page has become more complicated, making it difficult to recognize target content by only analyzing the HTML source code. In this paper, we propose a method to extract the informative blocks from a deep web using the layout feature. We consider the visual rectangular region of an HTML element as a visual block in web page....

  16. Measuring consistency of web page design and its effects on performance and satisfaction.

    Science.gov (United States)

    Ozok, A A; Salvendy, G

    2000-04-01

    This study examines the methods for measuring the consistency levels of web pages and the effect of consistency on the performance and satisfaction of the world-wide web (WWW) user. For clarification, a home page is referred to as a single page that is the default page of a web site on the WWW. A web page refers to a single screen that indicates a specific address on the WWW. This study has tested a series of web pages that were mostly hyperlinked. Therefore, the term 'web page' has been adopted for the nomenclature while referring to the objects of which the features were tested. It was hypothesized that participants would perform better and be more satisfied using web pages that have consistent rather than inconsistent interface design; that the overall consistency level of an interface design would significantly correlate with the three elements of consistency, physical, communicational and conceptual consistency; and that physical and communicational consistencies would interact with each other. The hypotheses were tested in a four-group, between-subject design, with 10 participants in each group. The results partially support the hypothesis regarding error rate, but not regarding satisfaction and performance time. The results also support the hypothesis that each of the three elements of consistency significantly contribute to the overall consistency of a web page, and that physical and communicational consistencies interact with each other, while conceptual consistency does not interact with them.

  17. Analyzing Web pages visual scanpaths: between and within tasks variability.

    Science.gov (United States)

    Drusch, Gautier; Bastien, J M Christian

    2012-01-01

    In this paper, we propose a new method for comparing scanpaths in a bottom-up approach, and a test of the scanpath theory. To do so, we conducted a laboratory experiment in which 113 participants were invited to accomplish a set of tasks on two different websites. For each site, they had to perform two tasks that had to be repeated ounce. The data were analyzed using a procedure similar to the one used by Duchowski et al. [8]. The first step was to automatically identify, then label, AOIs with the mean-shift clustering procedure [19]. Then, scanpaths were compared two by two with a modified version of the string-edit method, which take into account the order of AOIs visualizations [2]. Our results show that scanpaths variability between tasks but within participants seems to be lower than the variability within task for a given participant. In other words participants seem to be more coherent when they perform different tasks, than when they repeat the same tasks. In addition, participants view more of the same AOI when they perform a different task on the same Web page than when they repeated the same task. These results are quite different from what predicts the scanpath theory.

  18. Extraction of Web Content to Adapt Web Pages for Mobile Devices

    Directory of Open Access Journals (Sweden)

    Neha Gupta

    2011-03-01

    Full Text Available Now a day's mobile phones are replacing conventional PCs' as users are browsing and searching the Internet via their mobile handsets. Web based services and information can be accessed from any location with the help of these Mobile devices such as mobile phones, Personal Digital Assistants (PDA with relative ease. To access the educational data on mobile devices, web page adaptation is needed, keeping in mind security and quality of data. Various researchers are working on adaptation techniques. Educational web miner aims to develop an interface for kids to use mobile devices in a secure way. This paper presents a framework for adapting the web pages as part of educational web miner so that educational data can be accessed accurately, securely and concisely. The present paper is a part of the project whose aim is to develop an interface for kids, so that they can access the current knowledge bases from mobile devices in a secure way and to get accurate and concise information at ease. The related studies for adaptation technique are also presented in this paper.

  19. Unlocking the Gates to the Kingdom: Designing Web Pages for Accessibility.

    Science.gov (United States)

    Mills, Steven C.

    As the use of the Web is perceived to be an effective tool for dissemination of research findings for the provision of asynchronous instruction, the issue of accessibility of Web page information will become more and more relevant. The World Wide Web consortium (W3C) has recognized a disparity in accessibility to the Web between persons with and…

  20. 基于半监督学习的Web页面内容分类技术研究%Study on Web page content classification technology based on semi-supervised learning

    Institute of Scientific and Technical Information of China (English)

    赵夫群

    2016-01-01

    For the key issues that how to use labeled and unlabeled data to conduct Web classification,a classifier of com-bining generative model with discriminative model is explored. The maximum likelihood estimation is adopted in the unlabeled training set to construct a semi-supervised classifier with high classification performance. The Dirichlet-polynomial mixed distri-bution is used to model the text,and then a hybrid model which is suitable for the semi-supervised learning is proposed. Since the EM algorithm for the semi-supervised learning has fast convergence rate and is easy to fall into local optimum,two intelli-gent optimization methods of simulated annealing algorithm and genetic algorithm are introduced,analyzed and processed. A new intelligent semi-supervised classification algorithm was generated by combing the two algorithms,and the feasibility of the algorithm was verified.%针对如何使用标记和未标记数据进行Web分类这一关键性问题,探索一种生成模型和判别模型相互结合的分类器,在无标记训练集中采用最大似然估计,构造一种具有良好分类性能的半监督分类器.利用狄利克雷-多项式混合分布对文本进行建模,提出了适用于半监督学习的混合模型.针对半监督学习的EM算法收敛速度过快,容易陷入局部最优的难题,引入两种智能优化的方法——模拟退火算法和遗传算法进行分析和处理,结合这两种算法形成一种新型智能的半监督分类算法,并且验证了该算法的可行性.

  1. Review on Creation Technology for Web Page Block%Web页面分块技术综述

    Institute of Scientific and Technical Information of China (English)

    吕天; 于长富

    2012-01-01

    有很多不同的分块算法都可以对web网页进行分块.研究分块的1/1的是为了相关领域进一步研究的需要。例如通过页面块内容的重要程度研究基于块的搜索、定位网页的重要主题或内容,研究网页主要内容或主题的抽取,以及基于Web页面分块的Web存档等。首先给出Web页面分块问题定义和分类,并对几种典型的分块算法进行原理剖析,为进一步研究web页面分块问题提供一些有益的参考。%Many different sub-block algorithm can block Web page, the purpose of the research sub- block is the need for further research on related fields, such as the importance of studying page block contents based on block search, location of the important theme of the page or content, extraction of the main content or topic of study pages and sub-block based on the Web page, Web archive, etc. First gives a Web page sub-block problem definition and classification, and several typical block algorithm principle profiling, so as to provide a useful reference for further studying the problem of sub-blocks of a Web page.

  2. Environment: General; Grammar & Usage; Money Management; Music History; Web Page Creation & Design.

    Science.gov (United States)

    Web Feet, 2001

    2001-01-01

    Describes Web site resources for elementary and secondary education in the topics of: environment, grammar, money management, music history, and Web page creation and design. Each entry includes an illustration of a sample page on the site and an indication of the grade levels for which it is appropriate. (AEF)

  3. Social Responsibility and Corporate Web Pages: Self-Presentation or Agenda-Setting?

    Science.gov (United States)

    Esrock, Stuart L.; Leichty, Greg B.

    1998-01-01

    Examines how corporate entities use the Web to present themselves as socially responsible citizens and to advance policy positions. Samples randomly "Fortune 500" companies, revealing that, although 90% had Web pages and 82% of the sites addressed a corporate social responsibility issue, few corporations used their pages to monitor…

  4. Improving Web Page Retrieval using Search Context from Clicked Domain Names

    NARCIS (Netherlands)

    Li, Rongmei

    2009-01-01

    Search context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data implies its relevance to the query and can be use

  5. Environment: General; Grammar & Usage; Money Management; Music History; Web Page Creation & Design.

    Science.gov (United States)

    Web Feet, 2001

    2001-01-01

    Describes Web site resources for elementary and secondary education in the topics of: environment, grammar, money management, music history, and Web page creation and design. Each entry includes an illustration of a sample page on the site and an indication of the grade levels for which it is appropriate. (AEF)

  6. Improving Web Page Retrieval using Search Context from Clicked Domain Names

    NARCIS (Netherlands)

    Li, R.

    Search context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data implies its relevance to the query and can be

  7. Teaching E-Commerce Web Page Evaluation and Design: A Pilot Study Using Tourism Destination Sites

    Science.gov (United States)

    Susser, Bernard; Ariga, Taeko

    2006-01-01

    This study explores a teaching method for improving business students' skills in e-commerce page evaluation and making Web design majors aware of business content issues through cooperative learning. Two groups of female students at a Japanese university studying either tourism or Web page design were assigned tasks that required cooperation to…

  8. Teaching E-Commerce Web Page Evaluation and Design: A Pilot Study Using Tourism Destination Sites

    Science.gov (United States)

    Susser, Bernard; Ariga, Taeko

    2006-01-01

    This study explores a teaching method for improving business students' skills in e-commerce page evaluation and making Web design majors aware of business content issues through cooperative learning. Two groups of female students at a Japanese university studying either tourism or Web page design were assigned tasks that required cooperation to…

  9. The Situated Aspect of Creativity in Communicative Events: How Do Children Design Web Pages Together?

    Science.gov (United States)

    Fernandez-Cardenas, Juan Manuel

    2008-01-01

    This paper looks at the collaborative construction of web pages in History by a Year-4 group of children in a primary school in the UK. The aim of this paper is to find out: (a) How did children interpret their involvement in this literacy practice? (b) How the construction of web pages was interactionally accomplished? and (c) How can creativity…

  10. Web Page Design and Graphic Use of Three U.S. Newspapers.

    Science.gov (United States)

    Li, Xigen

    1998-01-01

    Contributes to scholarship on journalism and new technology by exploring approaches to Web page design and graphic use in three Internet newspapers. Explores how they demonstrate a change from the convention of newspaper publishing to the new media age, and how Web page design and graphic use reflect interconnectedness and a shift of control from…

  11. Social Responsibility and Corporate Web Pages: Self-Presentation or Agenda-Setting?

    Science.gov (United States)

    Esrock, Stuart L.; Leichty, Greg B.

    1998-01-01

    Examines how corporate entities use the Web to present themselves as socially responsible citizens and to advance policy positions. Samples randomly "Fortune 500" companies, revealing that, although 90% had Web pages and 82% of the sites addressed a corporate social responsibility issue, few corporations used their pages to monitor…

  12. Case and Relation (CARE based Page Rank Algorithm for Semantic Web Search Engines

    Directory of Open Access Journals (Sweden)

    N. Preethi

    2012-05-01

    Full Text Available Web information retrieval deals with a technique of finding relevant web pages for any given query from a collection of documents. Search engines have become the most helpful tool for obtaining useful information from the Internet. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing data to be reused across application. The proposed architecture use a hybrid methodology named Case and Relation (CARE based Page Rank algorithm which uses past problem solving experience maintained in the case base to form a best matching relations and then use them for generating graphs and spanning forests to assign a relevant score to the pages.

  13. AUTOMATIC TAGGING OF PERSIAN WEB PAGES BASED ON N-GRAM LANGUAGE MODELS USING MAPREDUCE

    Directory of Open Access Journals (Sweden)

    Saeed Shahrivari

    2015-07-01

    Full Text Available Page tagging is one of the most important facilities for increasing the accuracy of information retrieval in the web. Tags are simple pieces of data that usually consist of one or several words, and briefly describe a page. Tags provide useful information about a page and can be used for boosting the accuracy of searching, document clustering, and result grouping. The most accurate solution to page tagging is using human experts. However, when the number of pages is large, humans cannot be used, and some automatic solutions should be used instead. We propose a solution called PerTag which can automatically tag a set of Persian web pages. PerTag is based on n-gram models and uses the tf-idf method plus some effective Persian language rules to select proper tags for each web page. Since our target is huge sets of web pages, PerTag is built on top of the MapReduce distributed computing framework. We used a set of more than 500 million Persian web pages during our experiments, and extracted tags for each page using a cluster of 40 machines. The experimental results show that PerTag is both fast and accurate

  14. Predicting Web Page Accesses, using Users’ Profile and Markov Models

    OpenAIRE

    Zeynab Fazelipour

    2016-01-01

    Nowadays web is an important source for information retrieval, the sources on WWW are constantly increasing and the users accessing the web have different backgrounds. Consequently, finding the information which satisfies the personal users needs is not so easy. Exploration of users behaviors in the web, as a method for extracting the knowledge lying behind the way of how the users interact with the web, is considered as an important tool in the field of web mining. By identifying user's beha...

  15. Probing a Self-Developed Aesthetics Measurement Application (SDA) in Measuring Aesthetics of Mandarin Learning Web Page Interfaces

    CERN Document Server

    Zain, Jasni Mohamad; Goh, Yingsoon

    2011-01-01

    This article describes the accurateness of our application namely Self-Developed Aesthetics Measurement Application (SDA) in measuring the aesthetics aspect by comparing the results of our application and users' perceptions in measuring the aesthetics of the web page interfaces. For this research, the positions of objects, images element and texts element are defined as objects in a web page interface. Mandarin learning web pages are used in this research. These learning web pages comprised of main pages, learning pages and exercise pages, on the first author's E-portfolio web site. The objects of the web pages were manipulated in order to produce the desired aesthetic values. The six aesthetics related elements used are balance, equilibrium, symmetry, sequence, rhythm, as well as order and complexity. Results from the research showed that the ranking of the aesthetics values of the web page interfaces measured of the users were congruent with the expected perceptions of our designed Mandarin learning web pag...

  16. Web Page Segmentation for Small Screen Devices Using Tag Path Clustering Approach

    OpenAIRE

    Ms. S.Aruljothi; Mrs. S. Sivaranjani; Dr.S.Sivakumari

    2013-01-01

    The web pages breathing these days are developed to be displayed on a Desktop PCs and so viewing them on mobile web browsers is extremely tough. Since mobile devices have restricted resources, small screen device users need to scroll down and across the complicated sites persistently. To address the problem of resource limitation of small screen devices, a unique methodology of web page segmentation with tag path clustering is proposed, that reduces the memory space demand of the small hand-h...

  17. A Mobile Agent-based Web Page Constructing Framework MiPage

    Science.gov (United States)

    Fukuta, Naoki; Ozono, Tadachika; Shintani, Toramatsu

    In this paper, we present a programming framework, `MiPage', for realizing intelligent WWW applications based on the mobile agent technology. On the framework, an agent is programmed by using hyper text markup language and logic programming language. To realize the framework, we designed a new logic programming environment `MiLog', and an agent program compiler `MiPage Compiler'. The framework enables us to enhance both richness of the services and manageability of the application.

  18. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    OpenAIRE

    Kuppusamy, K. S.; Aghila, G.

    2012-01-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approach...

  19. JavaScript: Convenient Interactivity for the Class Web Page.

    Science.gov (United States)

    Gray, Patricia

    This paper shows how JavaScript can be used within HTML pages to add interactive review sessions and quizzes incorporating graphics and sound files. JavaScript has the advantage of providing basic interactive functions without the use of separate software applications and players. Because it can be part of a standard HTML page, it is…

  20. Project Management - Development of course materiale as WEB pages

    DEFF Research Database (Denmark)

    Thorsteinsson, Uffe; Bjergø, Søren

    1997-01-01

    Development of Internet pages with lessons plans, slideshows, links, conference system and interactive student section for communication between students and to teacher as well.......Development of Internet pages with lessons plans, slideshows, links, conference system and interactive student section for communication between students and to teacher as well....

  1. Review of Metadata Elements within the Web Pages Resulting from Searching in General Search Engines

    Directory of Open Access Journals (Sweden)

    Sima Shafi’ie Alavijeh

    2009-12-01

    Full Text Available The present investigation was aimed to study the scope of presence of Dublin Core metadata elements and HTML meta tags in web pages. Ninety web pages were chosen by searching general search engines (Google, Yahoo and MSN. The scope of metadata elements (Dublin Core and HTML Meta tags present in these pages as well as existence of a significant correlation between presence of meta elements and type of search engines were investigated. Findings indicated very low presence of both Dublin Core metadata elements and HTML meta tags in the pages retrieved which in turn illustrates the very low usage of meta data elements in web pages. Furthermore, findings indicated that there are no significant correlation between the type of search engine used and presence of metadata elements. From the standpoint of including metadata in retrieval of web sources, search engines do not significantly differ from one another.

  2. Design of an Interface for Page Rank Calculation using Web Link Attributes Information

    Directory of Open Access Journals (Sweden)

    Jeyalatha SIVARAMAKRISHNAN

    2010-01-01

    Full Text Available This paper deals with the Web Structure Mining and the different Structure Mining Algorithms like Page Rank, HITS, Trust Rank and Sel-HITS. The functioning of these algorithms are discussed. An incremental algorithm for calculation of PageRank using an interface has been formulated. This algorithm makes use of Web Link Attributes Information as key parameters and has been implemented using Visibility and Position of a Link. The application of Web Structure Mining Algorithm in an Academic Search Application has been discussed. The present work can be a useful input to Web Users, Faculty, Students and Web Administrators in a University Environment.

  3. Lost but not forgotten: finding pages on the unarchived web

    NARCIS (Netherlands)

    Huurdeman, H.C.; Kamps, J.; Samar, T.; Vries, A.P. de; Ben-David, A.; Rogers, R.A.

    2015-01-01

    Web archives attempt to preserve the fast changing web, yet they will always be incomplete. Due to restrictions in crawling depth, crawling frequency, and restrictive selection policies, large parts of the Web are unarchived and, therefore, lost to posterity. In this paper, we propose an approach to

  4. Ranking pages and the topology of the web

    CERN Document Server

    Arratia, Argimiro

    2011-01-01

    This paper presents our studies on the rearrangement of links from the structure of websites for the purpose of improving the valuation of a page or group of pages as established by a ranking function as Google's PageRank. We build our topological taxonomy starting from unidirectional and bidirectional rooted trees, and up to more complex hierarchical structures as cyclical rooted trees (obtained by closing cycles on bidirectional trees) and PR--digraph rooted trees (digraphs whose condensation digraph is a rooted tree that behave like cyclical rooted trees). We give different modifications on the structure of these trees and its effect on the valuation given by the PageRank function. We derive closed formulas for the PageRank of the root of various types of trees, and establish a hierarchy of these topologies in terms of PageRank. We show that the PageRank of the root of cyclical and PR--digraph trees basically depends on the number of vertices per level and the number of cycles of distinct lengths among lev...

  5. HTML5 Your visual blueprint for designing rich Web pages and applications

    CERN Document Server

    McDaniel, Adam

    2011-01-01

    Use the latest version of HTML to create dynamic Web pages HTML5 is the latest iteration of the standard markup language for creating Web pages. It boasts extensive updates from its predecessor and allows you to incorporate rich media content into a site without any dependence on extra software such as Flash. Packed with hundreds of screen shots, this visual guide introduces you to the many new features and abilities of HTML5 and shows you the many exciting new possibilities that exist for designing dynamic Web pages.Offers visual learners a solid reference on HTML5, the latest version of the

  6. Design and Validation of an Attention Model of Web Page Users

    OpenAIRE

    Ananya Jana; Samit Bhattacharya

    2015-01-01

    In this paper, we propose a model to predict the locations of the most attended pictorial information on a web page and the attention sequence of the information. We propose to divide the content of a web page into conceptually coherent units or objects, based on a survey of more than 100 web pages. The proposed model takes into account three characteristics of an image object: chromatic contrast, size, and position and computes a numerical value, the attention factor. We can predict from the...

  7. ONTOLOGY BASED WEB PAGE ANNOTATION FOR EFFECTIVE INFORMATION RETRIEVAL

    Directory of Open Access Journals (Sweden)

    S.Kalarani

    2010-11-01

    Full Text Available Today’s World Wide Web has large volume of data – billions of documents. So it is a time consuming process to discover effective knowledge from the input data. With today's keyword approach the amount of time and effort required to find the right information is directly proportional to the amount of information on the web.The web has grown exponentially and people are forced to spend more and more time in search for the information they are looking for. Lack of personalization as well as inability to easily separate commercial from non-commercial searches is among other limitations of today's web search technologies. This paper proposes a prototype relation-based search engine. “OntoLook” which has been designed in a virtual semantic web environment. The architecture has been proposed. The Semantic Web is well recognized as an effective infrastructure to enhance visibility of knowledge on the Web. The core of the Semantic Web is “ontology”, which is used to explicitly represent our conceptualizations. Ontology engineering in the Semantic Web isprimarily supported by languages such as RDF, RDFS and OWL. This paper discusses the requirements of ontology in the context of the Web, compares the above three languages with existing knowledge representation formalisms, and surveys tools for managing and applying ontology. Advantages of using ontology in both knowledge-base-style and database-style applications are demonstrated using one real world applications.

  8. Teaching Materials to Enhance the Visual Expression of Web Pages for Students Not in Art or Design Majors

    Science.gov (United States)

    Ariga, T.; Watanabe, T.

    2008-01-01

    The explosive growth of the Internet has made the knowledge and skills for creating Web pages into general subjects that all students should learn. It is now common to teach the technical side of the production of Web pages and many teaching materials have been developed. However teaching the aesthetic side of Web page design has been neglected,…

  9. Teaching Materials to Enhance the Visual Expression of Web Pages for Students Not in Art or Design Majors

    Science.gov (United States)

    Ariga, T.; Watanabe, T.

    2008-01-01

    The explosive growth of the Internet has made the knowledge and skills for creating Web pages into general subjects that all students should learn. It is now common to teach the technical side of the production of Web pages and many teaching materials have been developed. However teaching the aesthetic side of Web page design has been neglected,…

  10. The development of a web page for lipid science and research. Main web sites of interest

    Directory of Open Access Journals (Sweden)

    Boatella, J.

    2001-08-01

    Full Text Available Internet provide access to a huge of scientific and technical information on Internet which is not validated by any committee of experts. This information needs filtering in order to optimize user access to these resources. In this paper, we describe the development of a WEB page outlining the activity of our research team Food Lipids Quality and Health. The WEB page seeks to fulfil the following objectives: to communicate the activities of the team, to use effectively the resources that Internet offers and to promote their use among the team. We report on the methods used in achieving these objectives. Finally, a large number of WEB addresses related to Lipids are presented and classified. The addresses have been selected on the basis of their usefulness and interest value.En internet encontramos gran cantidad de información científico-técnica cuya validez no suele estar controlada por comités correctores. Para aprovechar estos recursos es necesario filtrar y facilitar el acceso del usuario a la información. En este artículo se expone la experiencia práctica en el desarrollo de una página WEB centrada en las actividades del grupo de investigación «Calidad Nutricional y Tecnología de los Lípidos». Los objetivos de esta página WEB fueron los siguientes: difusión de las actividades del grupo de investigación, aprovechar los recursos que ofrece internet y fomentar y facilitar su uso. Esta experiencia permitió presentar una metodología de trabajo eficaz para conseguir estos objetivos. Finalmente, se presentan un gran número de direcciones WEB agrupadas por apartados en el ámbito de los lípidos. Estas direcciones han sido rigurosamente seleccionadas, entre un gran número de referencias consultadas, siguiendo una serie de criterios que se discuten en este trabajo, para ofrecer aquellas que presentan un mayor interés práctico.

  11. PSB goes personal: The failure of personalised PSB web pages

    DEFF Research Database (Denmark)

    Sørensen, Jannick Kirk

    2013-01-01

    Between 2006 and 2011, a number of European public service broadcasting (PSB) organisations offered their website users the opportunity to create their own PSB homepage. The web customisation was conceived by the editors as a response to developments in commercial web services, particularly social...

  12. Building single-page web apps with meteor

    CERN Document Server

    Vogelsteller, Fabian

    2015-01-01

    If you are a web developer with basic knowledge of JavaScript and want to take on Web 2.0, build real-time applications, or simply want to write a complete application using only JavaScript and HTML/CSS, this is the book for you.This book is based on Meteor 1.0.

  13. A Chinese Web Page Clustering Algorithm Based on the Suffix Tree

    Institute of Scientific and Technical Information of China (English)

    YANG Jian-wu

    2004-01-01

    In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy.The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page.

  14. Block-o-Matic: a Web Page Segmentation Tool and its Evaluation

    OpenAIRE

    Sanoja, Andrés; Gançarski, Stéphane

    2013-01-01

    National audience; In this paper we present our prototype for the web page segmentation called Block-o-matic and its counterpart Block-o-manual, for manual segmentation. The main idea is to evaluate the correctness of the segmentation algorithm. Build a ground truth database for evaluation can take days or months depending on the collection size, however we address our solution with our manual segmentation tool intended to minimize the time of annotation of blocks in web pages. Both tools imp...

  15. Discovery and Classification of Bioinformatics Web Services

    Energy Technology Data Exchange (ETDEWEB)

    Rocco, D; Critchlow, T

    2002-09-02

    The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.

  16. Future Trends in Chlldren's Web Pages: Probing Hidden Biases for Information Quality

    Science.gov (United States)

    Kurubacak, Gulsun

    2007-01-01

    As global digital communication continues to flourish, Children's Web pages become more critical for children to realize not only the surface but also breadth and deeper meanings in presenting these milieus. These pages not only are very diverse and complex but also enable intense communication across social, cultural and political restrictions…

  17. Analysis and Testing of Ajax-based Single-page Web Applications

    NARCIS (Netherlands)

    Mesbah, A.

    2009-01-01

    This dissertation has focused on better understanding the shifting web paradigm and the consequences of moving from the classical multi-page model to an Ajax-based single-page style. Specifically to that end, this work has examined this new class of software from three main software engineering pers

  18. Future Trends in Children's Web Pages: Probing Hidden Biases for Information Quality

    Science.gov (United States)

    Kurubacak, Gulsun

    2007-01-01

    As global digital communication continues to flourish, Children's Web pages become more critical for children to realize not only the surface but also breadth and deeper meanings in presenting these milieus. These pages not only are very diverse and complex but also enable intense communication across social, cultural and political restrictions…

  19. Analysis and Testing of Ajax-based Single-page Web Applications

    NARCIS (Netherlands)

    Mesbah, A.

    2009-01-01

    This dissertation has focused on better understanding the shifting web paradigm and the consequences of moving from the classical multi-page model to an Ajax-based single-page style. Specifically to that end, this work has examined this new class of software from three main software engineering

  20. CaSePer: An efficient model for personalized web page change detection based on segmentation

    Directory of Open Access Journals (Sweden)

    K.S. Kuppusamy

    2014-01-01

    Full Text Available Users who visit a web page repeatedly at frequent intervals are more interested in knowing the recent changes that have occurred on the page than the entire contents of the web page. Because of the increased dynamism of web pages, it would be difficult for the user to identify the changes manually. This paper proposes an enhanced model for detecting changes in the pages, which is called CaSePer (Change detection based on Segmentation with Personalization. The change detection is micro-managed by introducing web page segmentation. The web page change detection process is made efficient by having it perform a dual-step process. The proposed method reduces the complexity of the change-detection by focusing only on the segments in which the changes have occurred. The user-specific personalized change detection is also incorporated in the proposed model. The model is validated with the help of a prototype implementation. The experiments conducted on the prototype implementation confirm a 77.8% improvement and a 97.45% accuracy rate.

  1. Design of a Web Page as a complement of educative innovation through MOODLE

    Science.gov (United States)

    Mendiola Ubillos, M. A.; Aguado Cortijo, Pedro L.

    2010-05-01

    In the context of Information Technology to impart knowledge and to establish MOODLE system as a support and complementary tool to on-site educational methodology (b-learning) a Web Page was designed in Agronomic and Food Industry Crops (Plantas de interés Agroalimentario) during 2006-07 course. This web was inserted in the Thecnical University of Madrid (Universidad Politécnica de Madrid) computer system to facilitate to the students the first contact with the contents of this subject. In this page the objectives and methodology, personal work planning, subject program given plus the activities are showed. At another web site, the evaluation criteria and recommended bibliography are located. The objective of this web page has been to make more transparent and accessible the necessary information in the learning process and presenting it in a more attractive frame. This page has been update and modified in each academic course offered since its first implementation. We had added in some cases new specific links to increase its useful. At the end of each course a test is applied to the students that take this subject. We have asked which elements would like to modify, delete and add to this web page. In this way the direct users give their point of view and help to improve the web page each course.

  2. Credibility judgments in web page design – a brief review

    OpenAIRE

    Selejan, O; Muresanu, DF; Popa, L.; Muresanu-Oloeriu, I; IUDEAN, D.; Buzoianu, A; Suciu, S.

    2016-01-01

    Today, more than ever, knowledge that interfaces appearance analysis is a crucial point in human-computer interaction field has been accepted. As nowadays virtually anyone can publish information on the web, the credibility role has grown increasingly important in relation to the web-based content. Areas like trust, credibility, and behavior, doubled by overall impression and user expectation are today in the spotlight of research compared to the last period, when other pragmatic areas such a...

  3. An Application of Session Based Clustering to Analyze Web Pages of User Interest from Web Log Files

    Directory of Open Access Journals (Sweden)

    c. P. Sumathi

    2010-01-01

    Full Text Available Problem statement: With the continued growth and proliferation of e-commerce, Web services and Web-based information systems, the volumes of click-stream and user data collected by Web-based organizations in their daily operations have reached astronomical proportions. Analyzing such data can help these organizations optimize the functionality of web-based applications and provide more personalized content to visitors. This type of analysis involved the automatic discovery of usage interest on the web pages which are often stored in web and applications server access logs. Approach: The usage interest on the web pages in various sessions was partitioned into clusters such that sessions with “similar” interest were placed in the same cluster using expectation maximization clustering technique as discussed in this study. Results: The approach results in the generation of usage profiles and automatic identification of user interest in each profile. Conclusion: The significance of the results will be helpful for organizations for web site improvement based on their navigational interest and provide recommendations for page(s not yet visited by the user.

  4. Science on the Web: Secondary School Students' Navigation Patterns and Preferred Pages' Characteristics

    Science.gov (United States)

    Dimopoulos, Kostas; Asimakopoulos, Apostolos

    2010-01-01

    This study aims to explore navigation patterns and preferred pages' characteristics of ten secondary school students searching the web for information about cloning. The students navigated the Web for as long as they wished in a context of minimum support of teaching staff. Their navigation patterns were analyzed using audit trail data software.…

  5. The Recognition of Web Pages' Hyperlinks by People with Intellectual Disabilities: An Evaluation Study

    Science.gov (United States)

    Rocha, Tania; Bessa, Maximino; Goncalves, Martinho; Cabral, Luciana; Godinho, Francisco; Peres, Emanuel; Reis, Manuel C.; Magalhaes, Luis; Chalmers, Alan

    2012-01-01

    Background: One of the most mentioned problems of web accessibility, as recognized in several different studies, is related to the difficulty regarding the perception of what is or is not clickable in a web page. In particular, a key problem is the recognition of hyperlinks by a specific group of people, namely those with intellectual…

  6. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    Science.gov (United States)

    Kuppusamy, K. S.; Aghila, G.

    2012-03-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.

  7. A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    CERN Document Server

    Kuppusamy, K S; 10.5120/5682-7720

    2012-01-01

    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.

  8. Use of Freely-Available Weebly in Creating Quick and Easy Web Pages: poster presentation

    OpenAIRE

    Hyde, Denise

    2011-01-01

    Weebly is a freely-available software for creating Web pages without having to know HTML. It is easy to use, with its drag and drop editor, and offers the ability to add documents, Web links, videos, slideshows, audio, forms, polls, etc. It is hosted by Weebly and has no limits on storage space. Many templates are available for Web page design. One can publish and update almost immediately. Combined with usage of the freely-available Google Analytics, for example, it is possible to gathe...

  9. CONSTRAINT INFORMATIVE RULES FOR GENETIC ALGORITHM-BASED WEB PAGE RECOMMENDATION SYSTEM

    Directory of Open Access Journals (Sweden)

    S. Prince Mary

    2013-01-01

    Full Text Available To predict the users navigation using web usage mining is the primary motto of the web page recommendation. Currently, researchers are trying to develop a web page recommendation using pattern mining technique. Here, we propose a technique for web page recommendation using genetic algorithm. It consists of three phases as data preparation, mining of informative rules and recommendation. The data preparation contains data preprocessing and user identification. The genetic algorithm is used to mine the informative rule. The genetic algorithm involves three processes which are calculating the fitness values, crossover and mutation. We use three different constraints as time duration, quality and recent visit to allow the process for next stage after the initial fitness calculation. We have to repeat these processes to find the best solution. To form the recommendation tree, we use the best solution which we obtain by means of genetic algorithm.

  10. A Method of Eliminating Noises in Web Pages by Style Tree Model and Its Applications

    Institute of Scientific and Technical Information of China (English)

    ZHAO Cheng-li; YI Dong-yun

    2004-01-01

    A Web page typically contains many information blocks.Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements.We call these blocks the noisy blocks.The noises in Web pages can seriously harm Web data mining.To the question of eliminating these noises, we introduce a new tree structure, called Style Tree, and study an algorithm how to construct a site style tree.The Style Tree Model is employed to detect and eliminate noises in any Web pages of the site.An information based measure to determine which element node is noisy is also constructed.In addition, the applications of this method are discussed in detail.Experimental results show that our noises elimination technique is able to improve the mining results significantly.

  11. A Literature Review of Academic Library Web Page Studies

    Science.gov (United States)

    Blummer, Barbara

    2007-01-01

    In the early 1990s, numerous academic libraries adopted the web as a communication tool with users. The literature on academic library websites includes research on both design and navigation. Early studies typically focused on design characteristics, since websites initially merely provided information on the services and collections available in…

  12. A Detailed Chunk-Level Performance Study of Web Page Retrieve Latency

    Institute of Scientific and Technical Information of China (English)

    XIE Hai-guang; LI Jian-hua; LI Xiang

    2005-01-01

    It is a widely discussed question that where the web latency comes from. In this paper, we propose a novel chunk-level latency dependence model to give a better illustration of the web latency. Based on the fact that web content is delivered in chunk sequence, and clients care more about whole page retrieval latency, this paper carries out a detailed study on how the chunk sequence and relations affect the web retrieval latency. A series of thorough experiments are also conducted and data analysis are also made. The result is useful for further study on how to reduce the web latency.

  13. Some Things You Always Wanted to Know About Web Pages (But Were Too Busy to Ask)

    OpenAIRE

    Schubert, Simon; Zwaenepoel, Willy

    2012-01-01

    The organic growth of the web has led to web sites that exhibit a large variety of properties. We conduct a large- scale study to gain quantitative insights into the browser-side effects of the structure and behavior of thousands of the most popular web sites. We find that 50 % of web pages load more than 100 resources from more than 17 distinct hosts and make the browser process more than 730 kB of javascript. Embedded third-party advertisements are the prevailing model on the web t...

  14. Personal Web home pages of adolescents with cancer: self-presentation, information dissemination, and interpersonal connection.

    Science.gov (United States)

    Suzuki, Lalita K; Beale, Ivan L

    2006-01-01

    The content of personal Web home pages created by adolescents with cancer is a new source of information about this population of potential benefit to oncology nurses and psychologists. Individual Internet elements found on 21 home pages created by youths with cancer (14-22 years old) were rated for cancer-related self-presentation, information dissemination, and interpersonal connection. Examples of adolescents' online narratives were also recorded. Adolescents with cancer used various Internet elements on their home pages for cancer-related self-presentation (eg, welcome messages, essays, personal history and diary pages, news articles, and poetry), information dissemination (e.g., through personal interest pages, multimedia presentations, lists, charts, and hyperlinks), and interpersonal connection (eg, guestbook entries). Results suggest that various elements found on personal home pages are being used by a limited number of young patients with cancer for self-expression, information access, and contact with peers.

  15. PSB goes personal: The failure of personalised PSB web pages

    Directory of Open Access Journals (Sweden)

    Jannick Kirk Sørensen

    2013-08-01

    Full Text Available Between 2006 and 2011, a number of European public service broadcasting (PSB organisations offered their website users the opportunity to create their own PSB homepage. The web customisation was conceived by the editors as a response to developments in commercial web services, particularly social networking and content aggregation services, but the customisation projects revealed tensions between the ideals of customer sovereignty and the editorial agenda-setting. This paper presents an overview of the PSB activities as well as reflections on the failure of the customisable PSB homepages. The analysis is based on interviews with the PSB editors involved in the projects and on studies of the interfaces and user comments. Commercial media customisation is discussed along with the PSB projects to identify similarities and differences.

  16. HIGWGET-A Model for Crawling Secure Hidden WebPages

    Directory of Open Access Journals (Sweden)

    K.F. Bharati

    2013-03-01

    Full Text Available The conventional search engines existing over the internet are active in searching the appropriate information. The search engine gets few constraints similar to attainment the information seeked from a different sources. The web crawlers are intended towards a exact lane of the web.Web Crawlers are limited in moving towards a different path as they are protected or at times limited because of the apprehension of threats. It is possible to make a web crawler,which will have the ability of penetrating from side to side the paths of the web, not reachable by the usual web crawlers, so as to get a improved answer in terms of infoemation, time and relevancy for the given search query. The proposed web crawler is designed to attend Hyper Text Transfer Protocol Secure (HTTPS websites including the web pages,which requires verification to view and index.

  17. HIGWGET-A Model for Crawling Secure Hidden WebPages

    Directory of Open Access Journals (Sweden)

    K.F. Bharati

    2013-04-01

    Full Text Available The conventional search engines existing over the internet are active in searching the appropriate information. The search engine gets few constraints similar to attainment the information seeked froma different sources. The web crawlers are intended towards a exact lane of the web.Web Crawlers are limited in moving towards a different path as they are protected or at times limited because of the apprehension ofthreats. It is possible to make a web crawler,which will have the ability of penetrating from side toside thepaths of the web, not reachable by the usual web crawlers, so as to get a improved answer in terms ofinfoemation, time and relevancy for the given search query. The proposed web crawler is designed toattend Hyper Text Transfer Protocol Secure (HTTPSwebsites including the web pages,which requiresverification to view and index.

  18. Lagrangian Methods Of Cosmic Web Classification

    CERN Document Server

    Fisher, J D; Johnson, M S T

    2015-01-01

    The cosmic web defines the large scale distribution of matter we see in the Universe today. Classifying the cosmic web into voids, sheets, filaments and nodes allows one to explore structure formation and the role environmental factors have on halo and galaxy properties. While existing studies of cosmic web classification concentrate on grid based methods, this work explores a Lagrangian approach where the V-web algorithm proposed by Hoffman et al. (2012) is implemented with techniques borrowed from smoothed particle hydrodynamics. The Lagrangian approach allows one to classify individual objects (e.g. particles or halos) based on properties of their nearest neighbours in an adaptive manner. It can be applied directly to a halo sample which dramatically reduces computational cost and potentially allows an application of this classification scheme to observed galaxy samples. Finally, the Lagrangian nature admits a straight forward inclusion of the Hubble flow negating the necessity of a visually defined thresh...

  19. To Overcome HITS Rank Similarity Confliction of Web Pages using Weight Calculation and Rank Improvement

    Science.gov (United States)

    Nath, Rajender; Kumar, Naresh

    2011-12-01

    Search Engine gives an ordered list of web search results in response to a user query, wherein the important pages are usually displayed at the top with less important ones afterwards. It may be possible that the user may have to look for many screen results to get the required documents. In literatures, many page ranking algorithms has been given to find the page rank of a page. For example PageRank is considered in this work. This algorithm treats all the links equally when distributing rank scores. That's why this algorithm some time gives equal importance to all the pages. But in real this can not be happen because, if two pages have same rank then how we can judge which page is more important then other. So this paper proposes another idea to organize the search results and describe which page is more important when confliction of same rank is produced by the PageRank. So that the user can get more relevant and important results easily and in a short span of time.

  20. Detection of spam web page using content and link-based techniques: A combined approach

    Indian Academy of Sciences (India)

    Rajendra Kumar Roul; Shubham Rohan Asthana; Mit Shah; Dhruvesh Parikh

    2016-02-01

    Web spam is a technique through which the irrelevant pages get higher rank than relevant pages in the search engine’s results. Spam pages are generally insufficient and inappropriate results for user. Many researchers are working in this area to detect the spam pages. However, there is no universal efficient technique developed so far which can detect all spam pages. This paper is an effort in that direction, where we propose a combined approach of content and link-based techniques to identify the spam pages. The content-based approach uses term density and Part of Speech (POS) ratio test and in the link-based approach, we explore the collaborative detection using personalized page ranking to classify the Web page as spam or non-spam. For experimental purpose, WEBSPAM-UK2006 dataset has been used. The results have been compared with some of the existing approaches. A good and promising F-measure of 75.2% demonstrates the applicability and efficiency of our approach.

  1. JavaScript and interactive web pages in radiology.

    Science.gov (United States)

    Gurney, J W

    2001-10-01

    Web publishing is becoming a more common method of disseminating information. JavaScript is an object-orientated language embedded into modern browsers and has a wide variety of uses. The use of JavaScript in radiology is illustrated by calculating the indices of sensitivity, specificity, and predictive values from a table of true positives, true negatives, false positives, and false negatives. In addition, a single line of JavaScript code can be used to annotate images, which has a wide variety of uses.

  2. Searchers' relevance judgments and criteria in evaluating Web pages in a learning style perspective

    DEFF Research Database (Denmark)

    Papaeconomou, Chariste; Zijlema, Annemarie F.; Ingwersen, Peter

    2008-01-01

    The paper presents the results of a case study of searcher's relevance criteria used for assessments of Web pages in a perspective of learning style. 15 test persons participated in the experiments based on two simulated work tasks that provided cover stories to trigger their information needs. Two...... learning styles were examined: Global and Sequential learners. The study applied eye-tracking for the observation of relevance hot spots on Web pages, learning style index analysis and post-search interviews to gain more in-depth information on relevance behavior. Findings reveal that with respect to use......, they are statistically insignificant. When interviewed in retrospective the resulting profiles tend to become even similar across learning styles but a shift occurs from instant assessments with content features of web pages replacing topicality judgments as predominant relevance criteria....

  3. MSoS: A Multi-Screen-Oriented Web Page Segmentation Approach

    OpenAIRE

    Sarkis, Mira; Concolato, Cyril; Dufourd, Jean-Claude

    2015-01-01

    International audience; In this paper we describe a multiscreen-oriented approach for segmenting web pages. The segmentation is an automatic and hybrid visual and structural method. It aims at creating coherent blocks which have different functions determined by the multiscreen environment. It is also characterized by a dynamic adaptation to the page content. Experiments are conducted on a set of existing applications that contain multimedia elements, in particular YouTube and video player pa...

  4. Web Pages Content Analysis Using Browser-Based Volunteer Computing

    Directory of Open Access Journals (Sweden)

    Wojciech Turek

    2013-01-01

    Full Text Available Existing solutions to the problem of finding valuable information on the Websuffers from several limitations like simplified query languages, out-of-date in-formation or arbitrary results sorting. In this paper a different approach to thisproblem is described. It is based on the idea of distributed processing of Webpages content. To provide sufficient performance, the idea of browser-basedvolunteer computing is utilized, which requires the implementation of text pro-cessing algorithms in JavaScript. In this paper the architecture of Web pagescontent analysis system is presented, details concerning the implementation ofthe system and the text processing algorithms are described and test resultsare provided.

  5. Football Fans, Their Information, The Web And The Personal Home Page

    OpenAIRE

    Narsesian, S.

    2010-01-01

    From the early days of the Internet to the present day, the World Wide Web has developed into one of the world's largest information resources. One of the first genres of web pages, which was also one of the first information resources, was the Personal Home Page (PHP). Over this same period of time, professional football in England has created the world's richest league and by extension an abundance of football related PHPs. This study investigates the role of the PHP as an information resou...

  6. Distribution of PageRank Mass Among Principle Components of the Web

    OpenAIRE

    Avrachenkov, Konstantin; Litvak, Nelly; Pham, Kim Son

    2007-01-01

    We study the PageRank mass of principal components in a bow-tie Web Graph, as a function of the damping factor c. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that it drops to zero when c goes to one. However, a detailed study of the OUT component reveals the presence ``dead-ends'' (small groups of pages linking only to each other) that receive an unfairly hi...

  7. Social Dynamics in Web Page through Inter-Agent Interaction

    Science.gov (United States)

    Takeuchi, Yugo; Katagiri, Yasuhiro

    Social persuasion abounds in human-human interactions. Attitudes and behaviors of people are invariably influenced by the attitudes and behaviors of other people as well as our social roles/relationships toward them. In the pedagogic scene, the relationship between teacher and learner produces one of the most typical interactions, in which the teacher makes the learner spontaneously study what he/she teaches. This study is an attempt to elucidate the nature and effectiveness of social persuasion in human-computer interaction environments. We focus on the social dynamics of multi-party interactions that involve both human-agent and inter-agent interactions. An experiment is conducted in a virtual web-instruction setting employing two types of agents: conductor agents who accompany and guide each learner throughout his/her learning sessions, and domain-expert agents who provide explanations and instructions for each stage of the instructional materials. In this experiment, subjects are assigned two experimental conditions: the authorized condition, in which an agent respectfully interacts with another agent, and the non-authorized condition, in which an agent carelessly interacts with another agent. The results indicate performance improvements in the authorized condition of inter-agent interactions. An analysis is given from the perspective of the transfer of authority from inter-agent to human-agent interactions based on social conformity. We argue for pedagogic advantages of social dynamics created by multiple animated character agents.

  8. Lagrangian methods of cosmic web classification

    Science.gov (United States)

    Fisher, J. D.; Faltenbacher, A.; Johnson, M. S. T.

    2016-05-01

    The cosmic web defines the large-scale distribution of matter we see in the Universe today. Classifying the cosmic web into voids, sheets, filaments and nodes allows one to explore structure formation and the role environmental factors have on halo and galaxy properties. While existing studies of cosmic web classification concentrate on grid-based methods, this work explores a Lagrangian approach where the V-web algorithm proposed by Hoffman et al. is implemented with techniques borrowed from smoothed particle hydrodynamics. The Lagrangian approach allows one to classify individual objects (e.g. particles or haloes) based on properties of their nearest neighbours in an adaptive manner. It can be applied directly to a halo sample which dramatically reduces computational cost and potentially allows an application of this classification scheme to observed galaxy samples. Finally, the Lagrangian nature admits a straightforward inclusion of the Hubble flow negating the necessity of a visually defined threshold value which is commonly employed by grid-based classification methods.

  9. A Dynamical Classification of the Cosmic Web

    CERN Document Server

    Forero-Romero, J E; Gottlöber, S; Klypin, A; Yepes, G

    2008-01-01

    A dynamical classification of the cosmic web is proposed. The large scale environment is classified into four web types: voids, sheets, filaments and knots. The classification is based on the evaluation of the deformation tensor, i.e. the Hessian of the gravitational potential, on a grid. The classification is based on counting the number of eigenvalues above a certain threshold, lambda_th at each grid point, where the case of zero, one, two or three such eigenvalues corresponds to void, sheet, filament or a knot grid point. The collection of neighboring grid points, friends-of-friends, of the same web attribute constitutes voids, sheets, filaments and knots as web objects. A simple dynamical consideration suggests that lambda_th should be approximately unity, upon an appropriate scaling of the deformation tensor. The algorithm has been applied and tested against a suite of (dark matter only) cosmological N-body simulations. In particular, the dependence of the volume and mass filling fractions on lambda_th a...

  10. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  11. Automatic categorization of web pages and user clustering with mixtures of hidden Markov models

    NARCIS (Netherlands)

    Ypma, A.; Heskes, T.M.

    2003-01-01

    We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumbersome) manual categorization. We provide an EM algorithm for training a mixture of HMMs and show that additional static

  12. The aware toolbox for the detection of law infringements on web pages

    Science.gov (United States)

    Shahab, Asif; Kieninger, Thomas; Dengel, Andreas

    2010-01-01

    In the project Aware we aim to develop an automatic assistant for the detection of law infringements on web pages. The motivation for this project is that many authors of web pages are at some points infringing copyrightor other laws, mostly without being aware of that fact, and are more and more often confronted with costly legal warnings. As the legal environment is constantly changing, an important requirement of Aware is that the domain knowledge can be maintained (and initially defined) by numerous legal experts remotely working without further assistance of the computer scientists. Consequently, the software platform was chosen to be a web-based generic toolbox that can be configured to suit individual analysis experts, definitions of analysis flow, information gathering and report generation. The report generated by the system summarizes all critical elements of a given web page and provides case specific hints to the page author and thus forms a new type of service. Regarding the analysis subsystems, Aware mainly builds on existing state-of-the-art technologies. Their usability has been evaluated for each intended task. In order to control the heterogeneous analysis components and to gather the information, a lightweight scripting shell has been developed. This paper describes the analysis technologies, ranging from text based information extraction, over optical character recognition and phonetic fuzzy string matching to a set of image analysis and retrieval tools; as well as the scripting language to define the analysis flow.

  13. The Impact of Salient Advertisements on Reading and Attention on Web Pages

    Science.gov (United States)

    Simola, Jaana; Kuisma, Jarmo; Oorni, Anssi; Uusitalo, Liisa; Hyona, Jukka

    2011-01-01

    Human vision is sensitive to salient features such as motion. Therefore, animation and onset of advertisements on Websites may attract visual attention and disrupt reading. We conducted three eye tracking experiments with authentic Web pages to assess whether (a) ads are efficiently ignored, (b) ads attract overt visual attention and disrupt…

  14. Automatic categorization of web pages and user clustering with mixtures of hidden Markov models

    NARCIS (Netherlands)

    Ypma, A.; Heskes, T.M.

    2003-01-01

    We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumbersome) manual categorization. We provide an EM algorithm for training a mixture of HMMs and show that additional static

  15. Detection and classification of Web robots with honeypots

    OpenAIRE

    McKenna, Sean F.

    2016-01-01

    Approved for public release; distribution is unlimited Web robots are automated programs that systematically browse the Web, collecting information. Although Web robots are valuable tools for indexing content on the Web, they can also be malicious through phishing, spamming, or performing targeted attacks. In this thesis, we study an approach to Web-robot detection that uses honeypots in the form of hidden resources on Web pages. Our detection model is based upon the observation that malic...

  16. The ATLAS Public Web Pages: Online Management of HEP External Communication Content

    CERN Document Server

    Goldfarb, Steven; Phoboo, Abha Eli; Shaw, Kate

    2015-01-01

    The ATLAS Education and Outreach Group is in the process of migrating its public online content to a professionally designed set of web pages built on the Drupal content management system. Development of the front-end design passed through several key stages, including audience surveys, stakeholder interviews, usage analytics, and a series of fast design iterations, called sprints. Implementation of the web site involves application of the html design using Drupal templates, refined development iterations, and the overall population of the site with content. We present the design and development processes and share the lessons learned along the way, including the results of the data-driven discovery studies. We also demonstrate the advantages of selecting a back-end supported by content management, with a focus on workflow. Finally, we discuss usage of the new public web pages to implement outreach strategy through implementation of clearly presented themes, consistent audience targeting and messaging, and th...

  17. Evaluations of User Creation Personal Portal Page Using DACS Web Service

    Directory of Open Access Journals (Sweden)

    Kazuya odagiri

    2012-08-01

    Full Text Available A personal portal, which is an entrance wherein each user can acquire the information that s/he is interested in on a network, is often used as an alternative means of communication. However, there are a number of problems with the existing personal portals. For example, because the Web page as a personalportal is generated by the program located on the specific Web server which is managed by a system administrator, it is not always ideal for all users. To solve this kind of problems, we developed two Web Service functions, which are realized on the network by introducing the Destination Addressing ControlSystem (DACS Scheme. These two Web Service functions are as next. The first is the function to extract the data for each user from a database and display it on the Web browser. The second is the function to retrieve the data for each user from a document medium and display it on the Web browser. Through these Web Service functions, each user can easily create a customized personal portal that displays personalinformation. In this paper, the above two functions are extended to manage information not only for each user but also for each group of users and for all users, and the extended two functions are integrated as a DACS Web Service. By using the DACS Web Service, each user can create and customize the Web page as a personal portal for practical usage in an individual organization. After the prototype system’s implementation, evaluations are performed.

  18. Is This Information Source Commercially Biased? How Contradictions between Web Pages Stimulate the Consideration of Source Information

    Science.gov (United States)

    Kammerer, Yvonne; Kalbfell, Eva; Gerjets, Peter

    2016-01-01

    In two experiments we systematically examined whether contradictions between two web pages--of which one was commercially biased as stated in an "about us" section--stimulated university students' consideration of source information both during and after reading. In Experiment 1 "about us" information of the web pages was…

  19. Is This Information Source Commercially Biased? How Contradictions between Web Pages Stimulate the Consideration of Source Information

    Science.gov (United States)

    Kammerer, Yvonne; Kalbfell, Eva; Gerjets, Peter

    2016-01-01

    In two experiments we systematically examined whether contradictions between two web pages--of which one was commercially biased as stated in an "about us" section--stimulated university students' consideration of source information both during and after reading. In Experiment 1 "about us" information of the web pages was…

  20. What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.

    Science.gov (United States)

    Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W

    2015-06-01

    Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.

  1. Key word placing in Web page body text to increase visibility to search engines

    Directory of Open Access Journals (Sweden)

    W. T. Kritzinger

    2007-11-01

    Full Text Available The growth of the World Wide Web has spawned a wide variety of new information sources, which has also left users with the daunting task of determining which sources are valid. Many users rely on the Web as an information source because of the low cost of information retrieval. It is also claimed that the Web has evolved into a powerful business tool. Examples include highly popular business services such as Amazon.com and Kalahari.net. It is estimated that around 80% of users utilize search engines to locate information on the Internet. This, by implication, places emphasis on the underlying importance of Web pages being listed on search engines indices. Empirical evidence that the placement of key words in certain areas of the body text will have an influence on the Web sites' visibility to search engines could not be found in the literature. The result of two experiments indicated that key words should be concentrated towards the top, and diluted towards the bottom of a Web page to increase visibility. However, care should be taken in terms of key word density, to prevent search engine algorithms from raising the spam alarm.

  2. Table Extraction from Web Pages Using Conditional Random Fields to Extract Toponym Related Data

    Science.gov (United States)

    Luthfi Hanifah, Hayyu’; Akbar, Saiful

    2017-01-01

    Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.

  3. Evaluations of User Creation Personal Portal Page Using DACS Web Service

    OpenAIRE

    Kazuya odagiri; Shogo Shimizu; Naohiro Ishii

    2012-01-01

    A personal portal, which is an entrance wherein each user can acquire the information that s/he is interested in on a network, is often used as an alternative means of communication. However, there are a number of problems with the existing personal portals. For example, because the Web page as a personalportal is generated by the program located on the specific Web server which is managed by a system administrator, it is not always ideal for all users. To solve this kind of problems, we deve...

  4. Language Identification of Web Pages Based on Improved N-gram Algorithm

    Directory of Open Access Journals (Sweden)

    Chew Yew Choong

    2011-05-01

    Full Text Available Language identification of written text in the domain of Latin-script based languages is a well-studied research field. However, new challenges arise when it is applied to non-Latin-script based languages, especially for Asian languages web pages. The objective of this paper is to propose and evaluate the effectiveness of adapting Universal Declaration of Human Rights and Biblical texts as a training corpus, together with two new heuristics to improve an n-gram based language identification algorithm for Asian languages. Extension of the training corpus produced improved accuracy. Improvement was also achieved by using byte-sequence based HTML parser and a HTML character entities converter. The performance of the algorithm was evaluated based on a written text corpus of 1,660 web pages, spanning 182 languages from Asia, Africa, the Americas, Europe and Oceania. Experimental result showed that the algorithm achieved a language identification accuracy rate of 94.04%.

  5. Effects of picture amount on preference, balance, and dynamic feel of Web pages.

    Science.gov (United States)

    Chiang, Shu-Ying; Chen, Chien-Hsiung

    2012-04-01

    This study investigates the effects of picture amount on subjective evaluation. The experiment herein adopted two variables to define picture amount: column ratio and picture size. Six column ratios were employed: 7:93,15:85, 24:76, 33:67, 41:59, and 50:50. Five picture sizes were examined: 140 x 81, 220 x 127, 300 x 173, 380 x 219, and 460 x 266 pixels. The experiment implemented a within-subject design; 104 participants were asked to evaluate 30 web page layouts. Repeated measurements revealed that the column ratio and picture size have significant effects on preference, balance, and dynamic feel. The results indicated the most appropriate picture amount for display: column ratios of 15:85 and 24:76, and picture sizes of 220 x 127, 300 x 173, and 380 x 219. The research findings can serve as the basis for the application of design guidelines for future web page interface design.

  6. Searchers' relevance judgments and criteria in evaluating Web pages in a learning style perspective

    DEFF Research Database (Denmark)

    Papaeconomou, Chariste; Zijlema, Annemarie F.; Ingwersen, Peter

    2008-01-01

    The paper presents the results of a case study of searcher's relevance criteria used for assessments of Web pages in a perspective of learning style. 15 test persons participated in the experiments based on two simulated work tasks that provided cover stories to trigger their information needs. Two...... learning styles were examined: Global and Sequential learners. The study applied eye-tracking for the observation of relevance hot spots on Web pages, learning style index analysis and post-search interviews to gain more in-depth information on relevance behavior. Findings reveal that with respect to use...... of graded relevance scores and number of relevance criteria applied per task and test person there are no significant difference between the different styles. Although there differences are detected in the use of relevance criteria between Global and Sequential learners during assessments...

  7. A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping

    OpenAIRE

    Fernández Villamor, José Ignacio; Blasco Garcia, Jacobo; Iglesias Fernandez, Carlos Angel; Garijo Ayestaran, Mercedes

    2011-01-01

    In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interface...

  8. Validation and Classification of Web Services using Equalization Validation Classification

    Directory of Open Access Journals (Sweden)

    ALAMELU MUTHUKRISHNAN

    2012-12-01

    Full Text Available In the business process world, web services present a managed and middleware to connect huge number of services. Web service transaction is a mechanism to compose services with their desired quality parameters. If enormous transactions occur, the provider could not acquire the accurate data at the correct time. So it is necessary to reduce the overburden of web service t ransactions. In order to reduce the excess of transactions form customers to providers, this paper propose a new method called Equalization Validation Classification. This method introduces a new weight - reducing algorithm called Efficient Trim Down algorit hm to reduce the overburden of the incoming client requests. When this proposed algorithm is compared with Decision tree algorithms of (J48, Random Tree, Random Forest, AD Tree it produces a better accuracy and Validation than the existing algorithms. The proposed trimming method was analyzed with the Decision tree algorithms and the results implementation shows that the ETD algorithm provides better performance in terms of improved accuracy with Effective Validation. Therefore, the proposed method provide s a good gateway to reduce the overburden of the client requests in web services. Moreover analyzing the requests arrived from a vast number of clients and preventing the illegitimate requests save the service provider time

  9. Age differences in search of web pages: the effects of link size, link number, and clutter.

    Science.gov (United States)

    Grahame, Michael; Laberge, Jason; Scialfa, Charles T

    2004-01-01

    Reaction time, eye movements, and errors were measured during visual search of Web pages to determine age-related differences in performance as a function of link size, link number, link location, and clutter. Participants (15 young adults, M = 23 years; 14 older adults, M = 57 years) searched Web pages for target links that varied from trial to trial. During one half of the trials, links were enlarged from 10-point to 12-point font. Target location was distributed among the left, center, and bottom portions of the screen. Clutter was manipulated according to the percentage of used space, including graphics and text, and the number of potentially distracting nontarget links was varied. Increased link size improved performance, whereas increased clutter and links hampered search, especially for older adults. Results also showed that links located in the left region of the page were found most easily. Actual or potential applications of this research include Web site design to increase usability, particularly for older adults.

  10. A kinematic classification of the cosmic web

    CERN Document Server

    Hoffman, Yehuda; Yepes, Gustavo; Gottlöber, Stefan; Forero-Romero, Jaime E; Libeskind, Noam I; Knebe, Alexander

    2012-01-01

    A new approach for the classification of the cosmic web is presented. In extension of the previous work of Hahn et al. (2007) and Forero-Romero et al. (2009) the new algorithm is based on the analysis of the velocity shear tensor rather than the gravitational tidal tensor. The procedure consists of the construction of the the shear tensor at each (grid) point in space and the evaluation of its three eigenvectors. A given point is classified to be either a void, sheet, filament or a knot according to the number of eigenvalues above a certain threshold, 0, 1, 2, or 3 respectively. The threshold is treated as a free parameter that defines the web. The algorithm has been applied to a dark matter only, high resolution simulation of a box of side-length 64$h^{-1}$Mpc and N = $1024^3$ particles with the framework of the WMAP5/LCDM model. The resulting velocity based cosmic web resolves structures down to <0.1$h^{-1}$Mpc scales, as opposed to the ~1$h^{-1}$Mpc scale of the tidal based web. The under-dense regions ...

  11. Domainwise Web Page Optimization Based On Clustered Query Sessions Using Hybrid Of Trust And ACO For Effective Information Retrieval

    Directory of Open Access Journals (Sweden)

    Dr. Suruchi Chawla

    2015-08-01

    Full Text Available Abstract In this paper hybrid of Ant Colony OptimizationACO and trust has been used for domainwise web page optimization in clustered query sessions for effective Information retrieval. The trust of the web page identifies its degree of relevance in satisfying specific information need of the user. The trusted web pages when optimized using pheromone updates in ACO will identify the trusted colonies of web pages which will be relevant to users information need in a given domain. Hence in this paper the hybrid of Trust and ACO has been used on clustered query sessions for identifying more and more relevant number of documents in a given domain in order to better satisfy the information need of the user. Experiment was conducted on the data set of web query sessions to test the effectiveness of the proposed approach in selected three domains Academics Entertainment and Sports and the results confirm the improvement in the precision of search results.

  12. Some features of alt texts associated with images in Web pages

    Directory of Open Access Journals (Sweden)

    Timothy C. Craven

    2006-01-01

    Full Text Available Introduction. This paper extends a series on summaries of Web objects, in this case, the alt attribute of image files. Method. Data were logged from 1894 pages from Yahoo!'s random page service and 4703 pages from the Google directory; an img tag was extracted randomly from each where present; its alt attribute, if any, was recorded; and the header for the corresponding image file was retrieved if possible. Analysis. Associations were measured between image type and use of null alt values, image type and image file size, image file size and alt text length, and alt text length and number of images on the page. Results. 16.6% and 17.3% of pages respectively showed no img elements. Of 1579 and 3888 img tags randomly selected from the remainder, 47.7% and 49.4% had alt texts, of which 26.3% and 27.5% were null. Of the 1316 and 3384 images for which headers could be retrieved, 71.2% and 74.2% were GIF, 28.1% and 20.5%, JPEG; and 0.8% and 0.8% PNG. GIF images were more commonly assigned null alt texts than JPEG images, and GIF files tended to be shorter than JPEG files. Weak positive correlations were observed between image file size and alt text length, except for JPEG files in the Yahoo! set. Alt texts for images from pages containing more images tended to be slightly shorter. Conclusion. Possible explanations for the results include GIF files' being more suited to decorative images and the likelihood that many images on image-rich pages are content-poor.

  13. An Exploratory Study of Student Satisfaction with University Web Page Design

    Science.gov (United States)

    Gundersen, David E.; Ballenger, Joe K.; Crocker, Robert M.; Scifres, Elton L.; Strader, Robert

    2013-01-01

    This exploratory study evaluates the satisfaction of students with a web-based information system at a medium-sized regional university. The analysis provides a process for simplifying data interpretation in captured student user feedback. Findings indicate that student classifications, as measured by demographic and other factors, determine…

  14. The effects of link format and screen location on visual search of web pages.

    Science.gov (United States)

    Ling, Jonathan; Van Schaik, Paul

    2004-06-22

    Navigation of web pages is of critical importance to the usability of web-based systems such as the World Wide Web and intranets. The primary means of navigation is through the use of hyperlinks. However, few studies have examined the impact of the presentation format of these links on visual search. The present study used a two-factor mixed measures design to investigate whether there was an effect of link format (plain text, underlined, bold, or bold and underlined) upon speed and accuracy of visual search and subjective measures in both the navigation and content areas of web pages. An effect of link format on speed of visual search for both hits and correct rejections was found. This effect was observed in the navigation and the content areas. Link format did not influence accuracy in either screen location. Participants showed highest preference for links that were in bold and underlined, regardless of screen area. These results are discussed in the context of visual search processes and design recommendations are given.

  15. Health on the Net Foundation: assessing the quality of health web pages all over the world.

    Science.gov (United States)

    Boyer, Célia; Gaudinat, Arnaud; Baujard, Vincent; Geissbühler, Antoine

    2007-01-01

    The Internet provides a great amount of information and has become one of the communication media which is most widely used [1]. However, the problem is no longer finding information but assessing the credibility of the publishers as well as the relevance and accuracy of the documents retrieved from the web. This problem is particularly relevant in the medical area which has a direct impact on the well-being of citizens. In this paper, we assume that the quality of web pages can be controlled, even when a huge amount of documents has to be reviewed. But this must be supported by both specific automatic tools and human expertise. In this context, we present various initiatives of the Health on the Net Foundation informing the citizens about the reliability of the medical content on the web.

  16. A Dynamic Web Page Prediction Model Based on Access Patterns to Offer Better User Latency

    CERN Document Server

    Mukhopadhyay, Debajyoti; Saha, Dwaipayan; Kim, Young-Chon

    2011-01-01

    The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer limited performance improvement because it is difficult for caching to handle the large number of increasingly diverse files. Studies have been conducted on prefetching models based on decision trees, Markov chains, and path analysis. However, the increased uses of dynamic pages, frequent changes in site structure and user access patterns have limited the efficacy of these static techniques. In this paper, we have proposed a methodology to cluster related pages into different categories based on the access patterns. Additionally we use page ranking to build up our prediction model at the initial stages when users haven't already started sending requests. This way we have tried to overcome the problems of maintaining huge databases which is needed in case of log based techn...

  17. The ATLAS Public Web Pages: Online Management of HEP External Communication Content

    Science.gov (United States)

    Goldfarb, S.; Marcelloni, C.; Eli Phoboo, A.; Shaw, K.

    2015-12-01

    The ATLAS Education and Outreach Group is in the process of migrating its public online content to a professionally designed set of web pages built on the Drupal [1] content management system. Development of the front-end design passed through several key stages, including audience surveys, stakeholder interviews, usage analytics, and a series of fast design iterations, called sprints. Implementation of the web site involves application of the html design using Drupal templates, refined development iterations, and the overall population of the site with content. We present the design and development processes and share the lessons learned along the way, including the results of the data-driven discovery studies. We also demonstrate the advantages of selecting a back-end supported by content management, with a focus on workflow. Finally, we discuss usage of the new public web pages to implement outreach strategy through implementation of clearly presented themes, consistent audience targeting and messaging, and the enforcement of a well-defined visual identity.

  18. The effectiveness of adapted web pages on the learning performance of students with severe mental retardation.

    Science.gov (United States)

    Li, Tien-Yu; Chen, Ming-Chung; Lin, Yun-Lung; Li, Shu-Chun

    2003-09-01

    Learning to use computers and/or using computers to learn has become a part of everyday life for most students. Unfortunately, students with mental retardation in Taiwan, especially those with moderate or severe mental retardation, are often not considered capable of utilizing computers and online learning. It is common for students with moderate or severe mental retardation to have poor vocabulary, or even be illiterate. Web pages mostly displayed in text form have therefore become the major obstacle for students with mental retardation when they try to learn online. This research focused on students with severe mental retardation by integrating picture communication symbols, voices and animation into a teaching home page, and then examined its effectiveness on learning.

  19. Pattern discovery for semi-structured web pages using bar-tree representation

    CERN Document Server

    Akbar, Z

    2011-01-01

    Many websites with an underlying database containing structured data provide the richest and most dense source of information relevant for topical data integration. The real data integration requires sustainable and reliable pattern discovery to enable accurate content retrieval and to recognize pattern changes from time to time; yet, extracting the structured data from web documents is still lacking from its accuracy. This paper proposes the bar-tree representation to describe the whole pattern of web pages in an efficient way based on the reverse algorithm. While previous algorithms always trace the pattern and extract the region of interest from \\textit{top root}, the reverse algorithm recognizes the pattern from the region of interest to both top and bottom roots simultaneously. The attributes are then extracted and labeled reversely from the region of interest of targeted contents. Since using conventional representations for the algorithm should require more computational power, the bar-tree method is d...

  20. [Analysis of the web pages of the intensive care units of Spain].

    Science.gov (United States)

    Navarro-Arnedo, J M

    2009-01-01

    In order to determine the Intensive Care Units (ICU) of Spanish hospitals that had a web site, to analyze the information they offered and to know what information they needed to offer according to a sample of ICU nurses, a cross-sectional observational, descriptive study was carried out between January and September 2008. For each ICU website, an analysis was made on the information available on the unit, its care, teaching and research activity on nursing. Simultaneously, based on a sample of intensive care nurses, the information that should be contained on an ICU website was determined. The results, expressed in absolute numbers and percentage, showed that 66 of the 292 hospitals with ICU (22.6%) had a web site; 50.7% of the sites showed the number of beds, 19.7% the activity report, 11.3% the published articles/studies and followed research lines and 9.9% the organized formation courses. 14 webs (19.7%) displayed images of nurses. However, only 1 (1.4%) offered guides on the actions followed. No web site offered a navigation section for nursing, the E-mail of the chief nursing, the nursing documentation used or if any nursing model of their own was used. It is concluded that only one-fourth of the Spanish hospitals with ICU have a web site; number of beds was the data offered by the most sites, whereas information on care, educational and investigating activities was very reduced and that on nursing was practically omitted on the web pages of intensive care units.

  1. 一种动态Web页面缓存技术%A Technology to Caching Dynamic Web Pages

    Institute of Scientific and Technical Information of China (English)

    贺琛; 陈肇雄; 黄河燕

    2002-01-01

    As a result of the drive of information diversity and the trend of personalization, more and more dynamicpages constitute the contents in the Internet. However, it is difficult to cache the dynamic Web pages because of theirdynamic characteristic. In this paper, we propose a technology to solve this problem. It isolates the static and dynam-ic contents of the page, and stores the original page of the Web server and the information of the background data-base locally, in order to improve the client's response time. Additionally, this technology also adopts correspondingreplacement algorithm and coherency policy to track the original page and the data-base update, which tries to syn-chronize the changes of the dynamic pages in time.

  2. Research on PageRank Algorithm Based on Web Page Segmentation Model%基于页面分块模型的PageRank算法研究

    Institute of Scientific and Technical Information of China (English)

    白似雪; 刘华斌

    2008-01-01

    提出了一个基于页面分块重要性模型的PageRank改进算法.该算法考虑同一页面内属于不同分块的出链接有着不同的重要性,故对不同分块的出链接赋予相应的权重,从而更合理、更公正、更有效地计算页面的PageRank值.与以往的PageRank算法及其改进算法相比,该算法以基于视觉特征的页面分块算法为核心,更好地反映了网页的特性,符合了用户的使用习惯,具有良好的效果.

  3. Discussion of Web Page Design Skills%浅谈网页页面设计技巧

    Institute of Scientific and Technical Information of China (English)

    许佳南

    2012-01-01

    DIV + CSS web page layout more and more widely used in web design,in this paper,DIV + CSS using method were detailed descripted through using of DIV + CSS technology to make a web pages.%DIV+CSS网页布局越来越多的被广泛应用于网页设计中,文章通过使用DIV+CSS技术制作一个网页页面详细说明DIV+CSS的使用方法。

  4. Page Content Extraction Based on Web Page Segmentation%分块布局下的主题型网页的内容抽取

    Institute of Scientific and Technical Information of China (English)

    聂卉; 张津华

    2012-01-01

    A Web page extraetion method based on the layout of Web page is proposed in this paper to implementtasks of page cleaning and content extraction. Firstly, a tag-tree is constructed by analyzing the corresponding DOM structure of original page. Then the tree is partitioned into a set of blocks from bottom to up in terms of categories of tags and concerning information of nodes, furthermore, blocks are classified on the basis of the proportion of word, link and image in blocks. Next, by using VSM (Vector Space Model) , text eigenvector of page's subject is abstracted, which has been used to calculate degree of correlation between block ' s content and page ' s subject. In the light of degree of correlation, we can judge which blocks should be got rid of and which ones should be kept. The content blocks with high degree of correlation are kept to reconstruct the description of Web page. The method has been applied in a project concerning Talent Information Collection. Test results indicate effectiveness of the method in page cleaning and contentextraction.%本篇论文以去除网页噪声,整合网页内容为目标,提出了面向主题型网页,根据网页规划布局抽取网页内容的方法.算法首先分析原始网页的DOM结构生成标签树,再根据标签分类和对应节点的信息对标签树自底向上进行划分,并依据划分块的文字密度,链接密度及图片密度,分类信息块.进一步,提炼网页主题的文本特征向量,采用基于词条空间的文本相似度计算,获取划分块的主题相关度,以主题相关度为量化基准剔除噪声,识别网页主旨内容,重构页面描述.这一算法被应用于面向人才资讯的信息采集项目中,实验表明,算法适用于主题型网页的"去噪"及内容提取,具体应用中有较理想的表现.

  5. How Does Designing Web Pages about Science Topics Affect Pre-service Teachers’ Skills of Computer Technology?

    Directory of Open Access Journals (Sweden)

    Mustafa Metin

    2009-12-01

    Full Text Available This study aimed to determine how designing web pages toward science topics affect pre-service teachers’ skills of computer technology use. The research was carried out in fall semester of 2006 at Artvin Çoruh University Education Faculty with 25 junior pre-service primary classroom teachers. In this study, the qualitative research method was used; and, the study was implemented in four parts: In the first part, the required software and its content were taught to pre-service teachers. In the second part, the participants were asked to prepare web pages related to the topics of the fifth grade Science Instruction Programme. In the third part, they were asked to present their web pages. And, in the last part, semi-structured interviews were carried out. It was revealed that pre-service teachers improved their skills such as searching through search engines, using some application software effectively and also they gained designing a web page and critical thinking skills towards web pages.

  6. PROTOTIPE PEMESANAN BAHAN PUSTAKA MELALUI WEB MENGGUNAKAN ACTIVE SERVER PAGE (ASP

    Directory of Open Access Journals (Sweden)

    Djoni Haryadi Setiabudi

    2002-01-01

    Full Text Available Electronic commerce is one of the components in the internet that growing fast in the world. In this research, it is developed the prototype for library service that offers library collection ordering especially books and articles through World Wide Web. In order to get an interaction between seller and buyer, there is an urgency to develop a dynamic web, which needs the technology and software. One of the programming languages is called Active Server Pages (ASP and it is combining with database system to store data. The other component as an interface between application and database is ActiveX Data Objects (ADO. ASP has an advantage in the scripting method and it is easy to make the configuration with database. This application consists of two major parts those are administrator and user. This prototype has the facilities for editing, searching and looking through ordering information online. Users can also do downloading process for searching and ordering articles. Paying method in this e-commerce system is quite essential because in Indonesia not everybody has a credit card. As a solution to this situation, this prototype has a form for user who does not have credit card. If the bill has been paid, he can do the transaction online. In this case, one of the ASP advantages will be used. This is called "session" when data in process would not be lost as long as the user still in that "session". This will be used in user area and admin area where the users and the admin can do various processes. Abstract in Bahasa Indonesia : Electronic commerce adalah satu bagian dari internet yang berkembang pesat di dunia saat ini. Pada penelitian ini dibuat suatu prototipe program aplikasi untuk pengembangan jasa layanan perpustakaan khususnya pemesanan artikel dan buku melalui World Wide Web. Untuk membangun aplikasi berbasis web diperlukan teknologi dan software yang mendukung pembuatan situs web dinamis sehingga ada interaksi antara pembeli dan penjual

  7. Web Page Information Extraction Technology%网页信息提取技术

    Institute of Scientific and Technical Information of China (English)

    邵振凯

    2013-01-01

    With the rapid development of the Internet,the amount of information in the Web page has become very large,how to quickly and efficiently search and find valuable information has become an important aspect of Web research. In this regard a tag extraction meth-od is proposed. Optimize the Web page into good HTML format documents with JTidy,and resolve to a DOM tree. Then use tag extrac-tion approach to extract the tags contain the text message content from DOM tree,remove the tags used to control the Web interaction and display,and use the method based on the punctuation information extraction method to remove the copyright notice and other informa-tion. The results on a number of different sites extraction show that the tags extraction methods not only have a great generality but also can accurately extract site theme.%随着互联网的快速发展,Web页面上的信息量已变得非常巨大,面对网页上海量的信息资源,如何快速有效地检索及发现有价值的信息已成为Web研究的一个重要方面。对此提出了一种标签提取方法。利用JTidy将网页优化为格式良好的HTML文档并解析为DOM树,然后用标签提取方法对该DOM树中包含有文本信息内容的叶子节点标签进行提取,把用于控制网页交互性和显示的标签删除掉,并运用基于标点符号的信息提取方法去除版权说明等信息。对不同网站的网页进行抽取实验,结果表明标签提取方法不但通用性强,而且能够准确地提取网页的主题信息。

  8. Analysis of co-occurrence toponyms in web pages based on complex networks

    Science.gov (United States)

    Zhong, Xiang; Liu, Jiajun; Gao, Yong; Wu, Lun

    2017-01-01

    A large number of geographical toponyms exist in web pages and other documents, providing abundant geographical resources for GIS. It is very common for toponyms to co-occur in the same documents. To investigate these relations associated with geographic entities, a novel complex network model for co-occurrence toponyms is proposed. Then, 12 toponym co-occurrence networks are constructed from the toponym sets extracted from the People's Daily Paper documents of 2010. It is found that two toponyms have a high co-occurrence probability if they are at the same administrative level or if they possess a part-whole relationship. By applying complex network analysis methods to toponym co-occurrence networks, we find the following characteristics. (1) The navigation vertices of the co-occurrence networks can be found by degree centrality analysis. (2) The networks express strong cluster characteristics, and it takes only several steps to reach one vertex from another one, implying that the networks are small-world graphs. (3) The degree distribution satisfies the power law with an exponent of 1.7, so the networks are free-scale. (4) The networks are disassortative and have similar assortative modes, with assortative exponents of approximately 0.18 and assortative indexes less than 0. (5) The frequency of toponym co-occurrence is weakly negatively correlated with geographic distance, but more strongly negatively correlated with administrative hierarchical distance. Considering the toponym frequencies and co-occurrence relationships, a novel method based on link analysis is presented to extract the core toponyms from web pages. This method is suitable and effective for geographical information retrieval.

  9. Automatic web services classification based on rough set theory

    Institute of Scientific and Technical Information of China (English)

    陈立; 张英; 宋自林; 苗壮

    2013-01-01

    With development of web services technology, the number of existing services in the internet is growing day by day. In order to achieve automatic and accurate services classification which can be beneficial for service related tasks, a rough set theory based method for services classification was proposed. First, the services descriptions were preprocessed and represented as vectors. Elicited by the discernibility matrices based attribute reduction in rough set theory and taking into account the characteristic of decision table of services classification, a method based on continuous discernibility matrices was proposed for dimensionality reduction. And finally, services classification was processed automatically. Through the experiment, the proposed method for services classification achieves approving classification result in all five testing categories. The experiment result shows that the proposed method is accurate and could be used in practical web services classification.

  10. Designing a Web Page to Improve Tutors’ Role in Nursing Students Formation

    Directory of Open Access Journals (Sweden)

    Juan Carlos Alvarado Peruyero

    2011-02-01

    Full Text Available Background: Higher Medical Education in Cuba is based on a work-related educational and pedagogical model. In this model, tutors play a key role, but their preparation to work with undergraduate nursing students is often insufficient. Objective: To design a web page containing the necessary information to improve tutors’ role in their work with undergraduate nursing students. Methods: descriptive research, conducted at the University of Medical Sciences of Cienfuegos from 2009 to February 2010. In order to identify tutors’ needs, a questionnaire and a diagnostic test were applied to 33,7 % of active-teaching tutors, randomly selected from a total of 169. During the second phase of this research contents for the web site were prepared and validated through expert criteria. Results: A website containing skills for each year and subject , curricula, skills control cards, existing resolutions and circulars, training strategies for work-related education, tutor’s functions, career curriculum, regulations for nursing practice, methodological instructions, organization of nursing curriculum and training to develop work-related education typologies was created. Conclusions: This website is a useful tool for monitoring nursing students during the work-related education process. It is available for tutors in their teaching settings and working places.

  11. Do-It-Yourself: A Special Library's Approach to Creating Dynamic Web Pages Using Commercial Off-The-Shelf Applications

    Science.gov (United States)

    Steeman, Gerald; Connell, Christopher

    2000-01-01

    Many librarians may feel that dynamic Web pages are out of their reach, financially and technically. Yet we are reminded in library and Web design literature that static home pages are a thing of the past. This paper describes how librarians at the Institute for Defense Analyses (IDA) library developed a database-driven, dynamic intranet site using commercial off-the-shelf applications. Administrative issues include surveying a library users group for interest and needs evaluation; outlining metadata elements; and, committing resources from managing time to populate the database and training in Microsoft FrontPage and Web-to-database design. Technical issues covered include Microsoft Access database fundamentals, lessons learned in the Web-to-database process (including setting up Database Source Names (DSNs), redesigning queries to accommodate the Web interface, and understanding Access 97 query language vs. Standard Query Language (SQL)). This paper also offers tips on editing Active Server Pages (ASP) scripting to create desired results. A how-to annotated resource list closes out the paper.

  12. Organising Development Knowledge: Towards Situated Classification Work on the Web

    Directory of Open Access Journals (Sweden)

    Maja van der Velden

    2008-09-01

    Full Text Available This paper addresses the classification of development knowledge in web-based resources. Seven categories of a marginalised knowledge domain are mapped across eleven web resources, with additional observations of classification work in India and Kenya. The analysis discusses how technological designs for web-based classification systems can become global hegemonic structures that may limit the participation of marginalised knowledge communities. The question of a more inclusive design is further explored in two offline, indigenous approaches to classifications. They suggest that a combination of both online and offline classification work, in which localised classifications are created, using local categories and tags, may enhance the participation of marginalised communities. The results of this research point to the need to design web-based resources that support the participation of diverse knowledge communities as well as the generation and representation of the diversity of knowledge. Future research may focus on the use of tags and the visualisation of the diverse ways in which an item can be categorised, in order to make web-based classifications more meaningful to marginalised knowledge communities.

  13. Using Frames and JavaScript To Automate Teacher-Side Web Page Navigation for Classroom Presentations.

    Science.gov (United States)

    Snyder, Robin M.

    HTML provides a platform-independent way of creating and making multimedia presentations for classroom instruction and making that content available on the Internet. However, time in class is very valuable, so that any way to automate or otherwise assist the presenter in Web page navigation during class can save valuable seconds. This paper…

  14. Effects of Learning Style and Training Method on Computer Attitude and Performance in World Wide Web Page Design Training.

    Science.gov (United States)

    Chou, Huey-Wen; Wang, Yu-Fang

    1999-01-01

    Compares the effects of two training methods on computer attitude and performance in a World Wide Web page design program in a field experiment with high school students in Taiwan. Discusses individual differences, Kolb's Experiential Learning Theory and Learning Style Inventory, Computer Attitude Scale, and results of statistical analyses.…

  15. The Effects of Web Page Design Instruction on Computer Self-Efficacy of Preservice Teachers and Correlates.

    Science.gov (United States)

    Chu, Li-Li

    2003-01-01

    Tests the effects of Web page design instruction on improving computer self-efficacy of preservice teachers. Various computer experiences, including weekly computer use, weekly Internet use, and use frequencies of word processing, e-mail, games, and presentation software were significantly related to computer self-efficacy. Use frequencies of word…

  16. Web Video Mining: Metadata Predictive Analysis using Classification Techniques

    Directory of Open Access Journals (Sweden)

    Siddu P. Algur

    2016-02-01

    Full Text Available Now a days, the Data Engineering becoming emerging trend to discover knowledge from web audiovisual data such as- YouTube videos, Yahoo Screen, Face Book videos etc. Different categories of web video are being shared on such social websites and are being used by the billions of users all over the world. The uploaded web videos will have different kind of metadata as attribute information of the video data. The metadata attributes defines the contents and features/characteristics of the web videos conceptually. Hence, accomplishing web video mining by extracting features of web videos in terms of metadata is a challenging task. In this work, effective attempts are made to classify and predict the metadata features of web videos such as length of the web videos, number of comments of the web videos, ratings information and view counts of the web videos using data mining algorithms such as Decision tree J48 and navie Bayesian algorithms as a part of web video mining. The results of Decision tree J48 and navie Bayesian classification models are analyzed and compared as a step in the process of knowledge discovery from web videos.

  17. Heuristic evaluation of paper-based Web pages: a simplified inspection usability methodology.

    Science.gov (United States)

    Allen, Mureen; Currie, Leanne M; Bakken, Suzanne; Patel, Vimla L; Cimino, James J

    2006-08-01

    Online medical information, when presented to clinicians, must be well-organized and intuitive to use, so that the clinicians can conduct their daily work efficiently and without error. It is essential to actively seek to produce good user interfaces that are acceptable to the user. This paper describes the methodology used to develop a simplified heuristic evaluation (HE) suitable for the evaluation of screen shots of Web pages, the development of an HE instrument used to conduct the evaluation, and the results of the evaluation of the aforementioned screen shots. In addition, this paper presents examples of the process of categorizing problems identified by the HE and the technological solutions identified to resolve these problems. Four usability experts reviewed 18 paper-based screen shots and made a total of 108 comments. Each expert completed the task in about an hour. We were able to implement solutions to approximately 70% of the violations. Our study found that a heuristic evaluation using paper-based screen shots of a user interface was expeditious, inexpensive, and straightforward to implement.

  18. Webpage Segments Classification with Incremental Knowledge Acquisition

    Science.gov (United States)

    Guo, Wei; Kim, Yang Sok; Kang, Byeong Ho

    This paper suggests an incremental information extraction method for social network analysis of web publications. For this purpose, we employed an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), to classify web page segments. Our experimental results show that our MCRDR-based web page segments classification system successfully supports easy acquisition and maintenance of information extraction rules.

  19. "Blogs" Catching on as Tool for Instruction: Teachers Use Interactive Web Pages to Hone Writing Skills

    Science.gov (United States)

    Borja, Rhea R.

    2005-01-01

    A growing number of K-12 educators are using Web logs, or "blogs" for short, to foster better writing, reading, communication, and other academic skills. Such Web sites, often open to the public, double as chronological journals and can include Web links and photographs as well as audio and video elements. Opinions on the use of blogs are shared…

  20. Improving the web site's effectiveness by considering each page's temporal information

    NARCIS (Netherlands)

    Li, ZG; Sun, MT; Dunham, MH; Xiao, YQ; Dong, G; Tang, C; Wang, W

    2003-01-01

    Improving the effectiveness of a web site is always one of its owner's top concerns. By focusing on analyzing web users' visiting behavior, web mining researchers have developed a variety of helpful methods, based upon association rules, clustering, prediction and so on. However, we have found littl

  1. Content Extraction Method Combining Web Page Structure and Text Feature%结合网页结构与文本特征的正文提取方法

    Institute of Scientific and Technical Information of China (English)

    熊忠阳; 蔺显强; 张玉芳; 牙漫

    2013-01-01

    网页中存在正文信息以及与正文无关的信息,无关信息的存在对Web页面的分类、存储及检索等带来负面的影响。为降低无关信息的影响,从网页的结构特征和文本特征出发,提出一种结合网页结构特征与文本特征的正文提取方法。通过正则表达式去除网页中的无关元素,完成对网页的初次过滤。根据网页的结构特征对网页进行线性分块,依据各个块的文本特征将其区分为链接块与文本块,并利用噪音块连续出现的结果完成对正文部分的定位,得到网页正文信息。实验结果表明,该方法能够快速准确地提取网页的正文内容。%There are both relevant information and irrelevant information in a Web page, the irrelevant information brings some negative influence to their classification, storage and retrieve. In order to reduce the influence, aiming at theme-related Web pages, this paper proposes a new method to extract the content of Web pages based on their text and structural features. It removes those unrelated tags in the Web page by regular expressions, and segments the Web into blocks according to Web pages structure and the text information. By analyzing the text blocks and link blocks of the Web, it only retains the main content of the page;those noisy parts are deleted from the page. Experimental result shows that the method is feasible and of high accuracy in page cleaning and content extraction.

  2. Knowledge Representation from Classification Schema to Semantic Web (I

    Directory of Open Access Journals (Sweden)

    Silvia-Adriana Tomescu

    2014-01-01

    Full Text Available In this essay we aim to investigate knowledge as approach of describing possible worlds through classification schema, taxonomies, ontologies and semantic web. We focus on the historical background and the methods of culture and civilization representation. In this regard, we studied the ancient concern to classify knowledge, from the biblical period when the Tree Metaphor concentrated the essence of knowledge, to the Francis Bacon classification and then Paul Otlet and we analysed the languages used in the scientific fields and then in the information science filed, emphasizing on the improvements of the ICT: hypertext and semantic web. We paid a special attention to the knowledge construction through math language and exchange standards. The reason of the approach comes from the logic and philosophic base of the knowledge representation that underline the idea that only properly structured scientific domains ensure the progress of the society.

  3. 基于视觉热区的网页内容抽取方法%WEB PAGES CONTENT EXTRACTION BASED ON VISUAL HOT ZONE

    Institute of Scientific and Technical Information of China (English)

    邵俊

    2012-01-01

    A study is made on web pages extraction and a new extraction method for web pages content is suggested. Layout features and visual hot zone are used by it to determine web pages content. In the paper, first a part of web page's region is selected as web page visual hot zone, the candidate content blocks are then obtained by documents object model. Furthermore, the significance function of the candidate content blocks is deduced to extracting content for web pages. Experimental results indicate that the proposed method has good performance.%对网页抽取进行研究,提出一种新的网页正文信息提取方法,它利用网页布局特征与网页视觉热区来确定网页正文信息.首先选取网页的一部分区域作为网页视觉热区,通过文档对象模型得到候选正文信息块,在此基础上,给出候选正文信息块重要度函数确定网页正文信息.实验结果表明,该方法具有良好的性能.

  4. IPACT: Improved Web Page Recommendation System Using Profile Aggregation Based On Clustering of Transactions

    Directory of Open Access Journals (Sweden)

    Yahya AlMurtadha

    2011-01-01

    Full Text Available Problem statement: Recently, Web usage mining techniques have been widely used to build recommendation systems especially for anonymous users. Approach: Assigning the current user to the best web navigation profile with similar navigation activities will improve the ability of the prediction engine to produce a recommendation list then introduce it to the user. This study presents iPACT an improved recommendation system using Profile Aggregation based on Clustering of Transactions (PACT. Results: iPACT shows better prediction accuracy than the previous methods PACT and Hypergraph. Conclusion: The users interests change over time; hence an incremental and adaptive web navigation profiling is a key feature for the future works.

  5. Web Approach for Ontology-Based Classification, Integration, and Interdisciplinary Usage of Geoscience Metadata

    Directory of Open Access Journals (Sweden)

    B Ritschel

    2012-10-01

    Full Text Available The Semantic Web is a W3C approach that integrates the different sources of semantics within documents and services using ontology-based techniques. The main objective of this approach in the geoscience domain is the improvement of understanding, integration, and usage of Earth and space science related web content in terms of data, information, and knowledge for machines and people. The modeling and representation of semantic attributes and relations within and among documents can be realized by human readable concept maps and machine readable OWL documents. The objectives for the usage of the Semantic Web approach in the GFZ data center ISDC project are the design of an extended classification of metadata documents for product types related to instruments, platforms, and projects as well as the integration of different types of metadata related to data product providers, users, and data centers. Sources of content and semantics for the description of Earth and space science product types and related classes are standardized metadata documents (e.g., DIF documents, publications, grey literature, and Web pages. Other sources are information provided by users, such as tagging data and social navigation information. The integration of controlled vocabularies as well as folksonomies plays an important role in the design of well formed ontologies.

  6. Applying Web Analytics to Online Finding Aids: Page Views, Pathways, and Learning about Users

    Directory of Open Access Journals (Sweden)

    Mark R. O'English

    2011-05-01

    Full Text Available Online finding aids, Internet search tools, and increased access to the World Wide Web have greatly changed how patrons find archival collections. Through analyzing eighteen months of access data collected via Web analytics tools, this article examines how patrons discover archival materials. Contrasts are drawn between access from library catalogs and from online search engines, with the latter outweighing the former by an overwhelming margin, and argues whether archival description practices should change accordingly.

  7. Evaluating the prevalence, content and readability of complementary and alternative medicine (CAM) web pages on the internet.

    Science.gov (United States)

    Sagaram, Smitha; Walji, Muhammad; Bernstam, Elmer

    2002-01-01

    Complementary and alternative medicine (CAM) use is growing rapidly. As CAM is relatively unregulated, it is important to evaluate the type and availability of CAM information. The goal of this study is to deter-mine the prevalence, content and readability of online CAM information based on searches for arthritis, diabetes and fibromyalgia using four common search engines. Fifty-eight of 599 web pages retrieved by a "condition search" (9.6%) were CAM-oriented. Of 216 CAM pages found by the "condition" and "condition + herbs" searches, 78% were authored by commercial organizations, whose pur-pose involved commerce 69% of the time and 52.3% had no references. Although 98% of the CAM information was intended for consumers, the mean read-ability was at grade level 11. We conclude that consumers searching the web for health information are likely to encounter consumer-oriented CAM advertising, which is difficult to read and is not supported by the conventional literature.

  8. Web Page Development and Management. SPEC Kit 246 and SPEC Flyer 246.

    Science.gov (United States)

    Liu, Yaping Peter, Comp.

    This SPEC (Systems and Procedures Exchange Center) Kit and Flyer reports results of two surveys conducted in 1996 and 1998 that examined ARL (Association of Research Libraries) member libraries' World Wide Web history, development, use, and activities. Fifty-six out of the then 119 ARL member institutions responded to the 1996 survey, and 68 out…

  9. Cosmic web-type classification using decision theory

    CERN Document Server

    Leclercq, Florent; Wandelt, Benjamin

    2015-01-01

    We propose a decision criterion for segmenting the cosmic web into different structure types (voids, sheets, filaments and clusters) on the basis of their respective probabilities and the strength of data constraints. Our approach is inspired by an analysis of games of chance where the gambler only plays if a positive expected net gain can be achieved based on some degree of privileged information. The result is a general solution for classification problems in the face of uncertainty, including the option of not committing to a class for a candidate object. As an illustration, we produce high-resolution maps of web-type constituents in the nearby Universe as probed by the Sloan Digital Sky Survey main galaxy sample. Other possible applications include the selection and labeling of objects in catalogs derived from astronomical survey data.

  10. Machine learning approach for automatic quality criteria detection of health web pages.

    Science.gov (United States)

    Gaudinat, Arnaud; Grabar, Natalia; Boyer, Célia

    2007-01-01

    The number of medical websites is constantly growing [1]. Owing to the open nature of the Web, the reliability of information available on the Web is uneven. Internet users are overwhelmed by the quantity of information available on the Web. The situation is even more critical in the medical area, as the content proposed by health websites can have a direct impact on the users' well being. One way to control the reliability of health websites is to assess their quality and to make this assessment available to users. The HON Foundation has defined a set of eight ethical principles. HON's experts are working in order to manually define whether a given website complies with s the required principles. As the number of medical websites is constantly growing, manual expertise becomes insufficient and automatic systems should be used in order to help medical experts. In this paper we present the design and the evaluation of an automatic system conceived for the categorisation of medical and health documents according to he HONcode ethical principles. A first evaluation shows promising results. Currently the system shows 0.78 micro precision and 0.73 F-measure, with 0.06 errors.

  11. 基于元搜索的网页去重算法%An algorithm of duplicated web pages detection based on meta-search engine

    Institute of Scientific and Technical Information of China (English)

    张玉连; 王莎莎; 宋桂江

    2011-01-01

    针对元搜索的重复网页问题,提出基于元搜索的网页去重算法,并通过实验对算法进行有效性验证.该算法首先对各成员搜索引擎返回来的结果网页的URL进行比较,然后对各结果网页的标题进行有关处理,提取出网页的主题信息,再对摘要进行分词,计算摘要的相似度,三者结合能很好的检测出重复网页,实现网页去重.该算法有效,并且比以往算法有明显的优势,更接近人工统计结果.%According to the duplicated web pages returning from meta-search engine, an algorithm of deletion of duplicated web pages based on meta-search engine is proposed.The effectiveness of the algorithm is verified through experiments.Firstly, the URL ofresult web pages is compared, which is retum by single search engines.Secondly, the titles of result web pages are processed,and thematic information of pages is extracted.Finally, the word segmentation on the summary is canied out, and the similarity of the summary is calculated.By combining these, the algorithm is able to test the duplicated web pages, realize the goal of deletion of duplicated web pages.Compared with the previous algorithms, the algorithm has obvious advantages and is closer to artificial results.

  12. Communicating public health preparedness information to pregnant and postpartum women: an assessment of Centers for Disease Control and Prevention web pages.

    Science.gov (United States)

    McDonough, Brianna; Felter, Elizabeth; Downes, Amia; Trauth, Jeanette

    2015-04-01

    Pregnant and postpartum women have special needs during public health emergencies but often have inadequate levels of disaster preparedness. Thus, improving maternal emergency preparedness is a public health priority. More research is needed to identify the strengths and weaknesses of various approaches to how preparedness information is communicated to these women. A sample of web pages from the Centers for Disease Control and Prevention intended to address the preparedness needs of pregnant and postpartum populations was examined for suitability for this audience. Five of the 7 web pages examined were considered adequate. One web page was considered not suitable and one the raters split between not suitable and adequate. None of the resources examined were considered superior. If these resources are considered some of the best available to pregnant and postpartum women, more work is needed to improve the suitability of educational resources, especially for audiences with low literacy and low incomes.

  13. 面向移动终端的Web页面重组技术综述%Survey of Web page reconstructing technology faced mobile terminal

    Institute of Scientific and Technical Information of China (English)

    史晶; 吴庆波; 杨沙洲

    2011-01-01

    在移动终端上浏览传统Web页面,存在着页面布局不合理、屏幕适应性差、噪声信息多等问题,严重影响页面的显示效果.Web页面重组技术通过对页面信息进行提取、组合,能够有效地解决上述问题,能够满足移动用户丰富多彩的页面体验效果.首先从页面提取和组合等方面对页面重组技术进行了论述,同时分析了相关技术的适用范围以及其复杂性,最后对当前领域研究的重点问题进行了总结.%There are many issues on the traditional Web page in the mobile terminal,such as unreasonable page layout, bad screen adaptation, quite a number of noisy information,etc. They seriously influence the page effect. Web page restructuring technology can solve above problems effectively through extracting and combining page information, and can make the mobile user have colorful page experience. This paper discussed Web page reconstructing technology from Web page extraction, combi-nation and related fields,and analyzed their applicability and complexity, finally concluded hot issues of this field.

  14. On the Wireframe Layout of WEB Page%浅析WEB页面线框图布局设计

    Institute of Scientific and Technical Information of China (English)

    李晨

    2012-01-01

    文章以"聚德金化工网站"改版设计项目为实例,依照网页设计工作流程中的基本环节,论述Web页面线框图布局设计的核心要点及具体分析方法,并从技术角度入手,重点论述线框图布局范围内的屏幕复杂度的分析与设定、规划设计版块及导航值计算等内容。%Taken the revised design of "Judejin Chemical Site" as an example,the paper discusses the core elements of the wireframe layout design of the Web page and the specific analytical methods in accordance with the basic aspects of web design workflow.In addition,from a technical perspective,the paper focuses on such contents as analysis and setting of the complexity of the screen within the scope of wireframe layout,forum planning and design,and navigation value calculation.

  15. Web页面两种模型对照研究%Compared Research on the Two Models of Web-pages

    Institute of Scientific and Technical Information of China (English)

    蔡清万

    2001-01-01

    Web技术可实现Internet上跨平台超文本和超媒体的链接,使得信息查询和发布方便快捷,于是Web页面的相关技术有了深入的发展,出现了静态和动态两种页面模型。对这两种模型页面的模型及其原理、页面制作、信息(数据)传送方式、数据库的连接与访问技术等四个方面进行了对照研究。%Web technology can perform the link of hypertext and hypermedia between tabletops on internet, which renders the query and issue of information convenient and swift. Hence, the technology related to Web-pages was further developed and two Web-page models, static model and active model appeared. This paper discusses the two Web-page models' models, principles, Web-pages making, transmission methods of information (data) as well as the connection and access technology of database.

  16. Implementation of Electronic Signature Control Oriented to Web Page%面向Web页面的电子签章控件的实现

    Institute of Scientific and Technical Information of China (English)

    郭腾芳; 韩建民; 李静; 罗方炜

    2011-01-01

    提出了一个面向Web页的电子签章控件的设计方法,该控件可实现对Web页上任何元素子集的数字签名,并可将带有电子印章的Web页面保存成本地文件,实现离线验证.论述了该控件的实现方法及相关技术,并将该控件应用于基于Web的电子合同中.实际应用表明,该控件能够保证签章Web页面中重要信息的完整性和不可否认性.%The paper proposes an approach to designing an electronic signature control oriented Web page, which can implement digital signature for any html elements set on the Web page. The control can also save Web page with the electronic seal as local document for offline verification. The paper discusses the implementation method related technologies of the control, and applies the control to Web-based electronic contract system. Practical application shows that the control mate the integrity and non-repudiation of web page with electronic seal.

  17. 基于 PLSA 模型的 Web 页面语义标注算法研究%Research on Web-page Semantic Annotation Algorithm Based on PLSA Model

    Institute of Scientific and Technical Information of China (English)

    王云英

    2013-01-01

    Efficient web-page semantic annotation is the key point to improve the efficient use of web information resource and knowledge innovation. This paper designs a web-page semantic annotation algorithm based on PLSA model according to the structural feature and the text feature existing in web-page to solve the problems of traditional annotation technology. The proposed algorithm constructs PLSA topic model for structural feature and text feature respectively, adopts an adaptive asymmetric learning approach to the integration and optimiza-tion of the PLSA model, forms a new comprehensive PLSA model to semantically annotate the unknown web pages automatically. Experi-mental results demonstrate that this algorithm dramatically improves the accuracy and efficiency of web-page semantic annotation, and can solve the problem of large-scale web-page annotation effectively.%  高效的 Web 页面语义标注方法是提高 Web 信息资源利用效率和知识创新的关键。针对当前 Web 页面语义标注方法存在的问题和 Web 页面表现出的结构特征和文本特征及其主题分布规律,设计了基于 PLSA 主题模型的 Web 页面语义标注算法。该算法分别对 Web 页面的结构特征和文本特征构建独立的 PLSA 主题模型,采用自适应不对称学习算法对这些独立的 PLSA 主题模型进行集成和优化,最终形成新的综合性的 PLSA 主题模型进行未知Web 页面的自动语义标注。实验结果表明,该算法能够显著提高 Web 页面语义标注的准确率和效率,可以有效地解决大规模 Web 页面语义标注问题。

  18. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads

    Directory of Open Access Journals (Sweden)

    Jünemann Sebastian

    2009-12-01

    Full Text Available Abstract Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing. CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.de.

  19. Analysis of Croatian archives' web page from the perspective of public programmes

    Directory of Open Access Journals (Sweden)

    Goran Pavelin

    2015-04-01

    Full Text Available In order to remain relevant in society, archivists should promote collections and records that are kept in the archives. Through public programmes, archives interact with customers and various public actors and create the institutional image. This paper is concerned with the role of public programmes in the process of modernization of the archival practice, with the emphasis on the Croatian state archives. The aim of the paper is to identify what kind of information is offered to users and public in general on the web sites of the Croatian state archives. Public programmes involve two important components of archival practice: archives and users. Therefore, public programmes ensure good relations with the public. Croatian archivists still question the need for public relations in archives, while American and European archives have already integrated public relations into the basic archival functions. The key components needed for successful planning and implementation of public programs are the source of financing, compliance with the annual work plan, clear goals, defined target audience, cooperation and support from the local community, and the evaluation of results.

  20. A Novel Method for Bilingual Web Page Mining Via Search Engines%基于搜索引擎的双语混合网页识别新方法

    Institute of Scientific and Technical Information of China (English)

    冯艳卉; 洪宇; 颜振祥; 姚建民; 朱巧明

    2011-01-01

    A new approach has been developed for acquiring bilingual web pages from the result pages of search engines, which is composed of two challenging tasks. The first task is to detect web records embedded in the result pages automatically via a clustering method of a sample page. Identifying these useful records through the clustering method allows the generation of highly effective features for the next task which is high-quality bilingual web page acquisition. The task of high-quality bilingual web page acquisition is assumed as a classification problem. One advantage of our approach is that it is independent of the search engine and the domain. The test is based on 2 516 records extracted from six search engines automatically and annotated manually, which gets a high precision of 81.3% and a recall of 94.93%. The experimental results indicate that our approach is very effective.%该文提出了一种从搜索引擎返回的结果网页中获取双语网页的新方法,该方法分为两个任务.第一个任务是自动地检测并收集搜索引擎返回的结果网页中的数据记录.该步骤通过聚类的方法识别出有用的记录摘要并且为下一个任务即高质量双语混合网页的验证及其荻取提供有效特征.该文中把双语混合网页的验证看作是有效的分类问题,该方法不依赖于特定领域和搜索引擎.基于从搜索引擎收集并经过人工标注的2516务检索结果记录,该文提出的方法取得了81.3%的精确率和94.93%的召回率.

  1. 基于Apriori算法的Deep Web网页关系挖掘研究%Study on Deep Web pages mining based on Apriori algorithm

    Institute of Scientific and Technical Information of China (English)

    李贵; 韩子扬; 郑新录; 李征宇

    2011-01-01

    The max frequent association pages in Deep Web sites are recognized by using Apriori algorithm, and the non-max frequent association pages are pruned. Then, all the max frequent association pages are obtained by website traversing. Experimental results of some real estate Deep Web data extraction prove that the algorithm is feasible and valid.%利用Apriori算法对Deep Web网站中最大频繁关联关系网页进行识别,并对非最大频繁项网页进行剪枝,再遍历Deep Web网站网页,从而获取所有最大频繁关联关系网页.对某房地产Deep Web网站的实验结果验证了该算法的可行性和有效性.

  2. Discussion on the layout design skills of web page%网页的版式设计技巧探讨

    Institute of Scientific and Technical Information of China (English)

    潘燕

    2016-01-01

    要设计出主题突出、创意新颖、布局合理,界面简洁美观的网页,版式设计起到了至关重要的作用。文章分析、探讨了版式设计在网页中的应用技巧,按照网页版式设计的要求,先对网页版式的整体布局进行设计,再对网页版式的文字、图片和色彩三大重要元素进行合理规划,尽可能在有限的版面空间中,将多种视觉元素进行整体排列,运用版式设计的美学原理做出更好的网页。%In order to design a web page of topic-prominent, creative and rational layout, and has simple and beautiful interface, layout design played a crucial role. This paper analyzes the specific application of layout design skills in the web page. According to the requirements of page layout design, the overall layout of web page is designed firstly, then reasonably plans the three important elements of text, picture and color on the page layout, and makes the overall arrangement of a variety of visual elements in a limited space as far as possible, to make the better web pages with the aesthetic principles of layout design.

  3. SurveyWiz and factorWiz: JavaScript Web pages that make HTML forms for research on the Internet.

    Science.gov (United States)

    Birnbaum, M H

    2000-05-01

    SurveyWiz and factorWiz are Web pages that act as wizards to create HTML forms that enable one to collect data via the Web. SurveyWiz allows the user to enter survey questions or personality test items with a mixture of text boxes and scales of radio buttons. One can add demographic questions of age, sex, education, and nationality with the push of a button. FactorWiz creates the HTML for within-subjects, two-factor designs as large as 9 x 9, or higher order factorial designs up to 81 cells. The user enters levels of the row and column factors, which can be text, images, or other multimedia. FactorWiz generates the stimulus combinations, randomizes their order, and creates the page. In both programs HTML is displayed in a window, and the user copies it to a text editor to save it. When uploaded to a Web server and supported by a CGI script, the created Web pages allow data to be collected, coded, and saved on the server. These programs are intended to assist researchers and students in quickly creating studies that can be administered via the Web.

  4. FUZZY CLUSTERING ALGORITHMS FOR WEB PAGES AND CUSTOMER SEGMENTS%Web页面和客户群体的模糊聚类算法

    Institute of Scientific and Technical Information of China (English)

    宋擒豹; 沈钧毅

    2001-01-01

    Web log mining is broadly used in E-commerce and personalizationof the Web. In this paper, the fuzzy clustering algorithms for Web pages and customers is presented. First, the fuzzy sets of Web page and customer are setup separately according to the hitting information of customers. Second, the fuzzy similarity matrices ave constructed on the basis of the fuzzy sets and the Max-Min similarity measure scheme. Finally, Web page clusters and tustomer segments are abstracted directly from the corresponding fuzzy similarity matrix. Experiments show the effectiveness of the algorithm.%web日志挖掘在电子商务和个性化web等方面有着广泛的应用.文章介绍了一种web页面和客户群体的模糊聚类算法.在该算法中,首先根据客户对Web站点的浏览情况分别建立Web页面和客户的模糊集,在此基础上根据Max—Min模糊相似性度量规则构造相应的模糊相似矩阵,然后根据模糊相似矩阵直接进行聚类.实验结果表明该算法是有效的.

  5. A Credibility Evaluation Method for Web Information Based on Improved PageRank%基于改进的PageRank的网页信息可信度评估方法

    Institute of Scientific and Technical Information of China (English)

    马伟瑜; 袁方

    2011-01-01

    To solve the problem of measuring Web information's credibility, a method for computing Web information's credibility was proposed. This method not only considered the interactive structure between Webs, but also took into account semantic relation between the Web information. As the interaction between Web information possessed different release time, in order to reflect time factor, the time decay function was introduced in computing. Experimental result showed that the proposed method was feasible and effective, and could provide Web information with higher credibility for users.%提出了一种基于改进的PageRank的网页信息可信度评估方法,该方法既考虑了网页间的交互结构,也考虑了网页信息主题间的语义关系.由于网页信息内容具有不同的发布时间,在进行可信度评估过程中,还引人了时间衰减函数,从而反映时间这一要素的影响.实验结果表明,所提出的可信度评估方法是有效的,可以为用户提供可信度较高的网页.

  6. A comparison of Web page and slide/tape for instruction in periapical and panoramic radiographic anatomy.

    Science.gov (United States)

    Ludlow, J B; Platin, E

    2000-04-01

    Self-guided slide/tape (ST) and web page (WP) instruction in normal radiographic anatomy of periapical and panoramic images is compared using objective test performance and subjective preference of freshman dental students. A class of seventy-four students was divided into a group studying anatomy in periapical images using WP and a group studying similar ST material. In a modified cross-over design the groups switched presentation technologies to learn anatomy in panoramic images. Students completed self-administered on-line quizzes covering WP materials and conventional quizzes for ST material. Students also completed a voluntary survey. Mean quiz performance identifying matched anatomic features in PA (n = 26) and panoramic images (n = 35) was excellent (96.9%) and not different between image types (p = 0.12) or presentation technologies (p = 0.81). Students preferred WP for accessibility, ease of use, freedom of navigation, and image quality (p < .01). Student comfort level with the quiz formats of the two technologies was not different (p = 0.11). Students experienced a higher rate of mechanical and logistical problems with ST than with WP technology. While 71 percent of the students preferred WP technology, this preference appears to be related to ease of use and facilitation of flexible learning styles rather than improved didactic performance.

  7. 有效的爬行Ajax页面的网络爬行算法%Efficient Algorithm for Crawling Ajax Web Pages

    Institute of Scientific and Technical Information of China (English)

    李华波; 吴礼发; 赖海光; 郑成辉; 黄康宇

    2013-01-01

    The generation of Ajax web pages and the Ajax page navigation must execute the client JavaScript, thus it is impossible to extract the complete content of an Ajax page through the traditional crawling algorithms. In this paper, the working mode of Ajax is analyzed, the problem of crawling Ajax web pages is elaborated, and an effective algorithm for crawling Ajax pages is proposed. The algorithm can realize the dynamic generation of Ajax web contents in client browser and the navigation of Ajax web pages, and also it can assign identification number for the crawled pages whose static pages can be generated. Experimental result shows that the number of Ajax pages crawled by the proposed algorithm is obvious bigger than the traditional ones’, and the presented replicas-detecting policies can effectively reduce the time consumption of the algorithm.%  Ajax页面的生成和页面导航需要执行客户端的JavaScript代码,传统网络爬行算法无法获取Ajax页面全部内容。分析了Ajax的工作方式,阐述了爬行Ajax网页所面临的主要问题,提出并实现了一种有效爬行Ajax页面的网络爬行算法。该算法可控制客户端浏览器动态生成页面内容和完成页面导航,为爬行过的页面分配标识编号并生成相应静态页面。实验结果表明,提出的算法所爬行的Ajax页面数量明显多于传统方法,同时,采用的双重消重策略可有效减少算法的时间耗费。

  8. RESEARCH ON VISUAL GUIDANCE DESIGN OF WEB PAGE ICON%网页图标视觉引导设计研究

    Institute of Scientific and Technical Information of China (English)

    杨洁

    2015-01-01

    网页图标作为网站设计中一个不可或缺的组成部分,起着对网页信息的强调和提示性作用,引导人们的行为操作和信息获取的重要功能。本文从网页图标的基本功能出发,结合认知心理学,阐述了要构建有效的网页图标视觉引导主要有四种设计方法:隐喻引导、分类引导、方向性引导、动态元素引导。此外,本文还就不同表现形式的网页图标的特点和视觉引导优势做了阐述。%Web page icon is an indispensable component in the web design. It plays an important role in emphasizing and prompting web information. And it also has a most important function which can guide people's behavior and information retrieval. In this paper, starting from the basic function of web page icon and combining with the cognitive psychology, it expounded on how to build the effective visual guidance of web page icon. There were four main methods:metaphor guidance, classified guidance, directional guidance and dynamic elements guidance. In addition, it expounded on the characteristics of different forms of web page icon, and their advantages of visual guidance.

  9. 基于XML的Web页面到WAP页面的转换模型的设计%Design of Transformation Model from Web Page to WAP Page Based on XML

    Institute of Scientific and Technical Information of China (English)

    梁霄波

    2012-01-01

    设计一种基于XML的Web页面到WAP页面的中间转换模型.并给出相关算法。该模型对HTML纯文本内容的转换有着比较明显的优势.而且既可以保留原来基于HTML的网页,又可以增加面向移动用户的wAP页面,省去了大量重复的工作,实现页面资源的共享共存.%Designs a transformatioh model based on XML, and provides algorithms to achieve successful transformation from Web page to WAP page. This transformation model has obvious advantages, especially for the transformation of pure HTML contents. Following such way of transcoding, HTML pages can be reserved, and moreover, WAP page of the mobile users-oriented will go up. Meanwhile, a lot of repeated work will be eliminated, thus realizing the sharing of informa- tion and benefiting mobile users.

  10. Topic information extraction from Web pages based on tree comparison%基于树比较的Web页面主题信息抽取

    Institute of Scientific and Technical Information of China (English)

    朱梦麟; 李光耀; 周毅敏

    2011-01-01

    为了从具有海量信息的Internet上自动抽取Web页面的信息,提出了一种基于树比较的Web页面主题信息抽取方法。通过目标页面与其相似页面所构建的树之间的比较,简化了目标页面,并在此基础上生成抽取规则,完成了页面主题信息的抽取。对国内主要的一些网站页面进行的抽取检测表明,该方法可以准确、有效地抽取Web页面的主题信息。%In order to automatically extract Web page information from Internet that contains magnanimous information, this paper presented an approach based on tree comparison. This approach compared tree built from the target page with that ones built from its similar pages to simplify the target page. Extraction rules were generated on this basis, and then we used the rules to extract topic information from the target Web page. Experiment result shows this extraction method is precise and efficient.

  11. Trident Web page

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, Randall P. [Los Alamos National Laboratory; Fernandez, Juan C. [Los Alamos National Laboratory

    2012-06-25

    An Extensive Diagnostic Suite Enables Cutting-edge Research at Trident The Trident Laser Facility at Los Alamos National Laboratory is an extremely versatile Nd:glass laser system dedicated to high energy density physics research and fundamental laser-matter interactions. Trident's Unique Laser Capabilities Provide an Ideal Platform for Many Experiments. The laser system consists of three high energy beams which can be delivered into two independent target experimental areas. The target areas are equipped with an extensive suite of diagnostics for research in ultra-intense laser matter interactions, dynamic material properties, and laser-plasma instabilities. Several important discoveries and first observations have been made at Trident including laser-accelerated MeV mono-energetic ions, nonlinear kinetic plasma waves, transition between kinetic and fluid nonlinear behavior, as well as other fundamental laser-matter interaction processes. Trident's unique long-pulse capabilities have enabled state-of-the-art innovations in laser-launched flyer-plates, and other unique loading techniques for material dynamics research.

  12. 基于树结构的Web页面适配方法的研究%The research of the Web page adaptation method based on tree structure

    Institute of Scientific and Technical Information of China (English)

    高集荣; 田艳; 江晓妍

    2014-01-01

    设计和实现了从互联网页面到手机页面的适配转换机制,提出了基于树结构分析的Web 页面适配方法,该适配方法首先对互联网页面建立对应的文档模型树结构,依据用户硬件数据信息,对这棵树进行网页去噪声、对 Frameset/Iframe 适配、分页重排、智能缓存以及多国语言字符集支持的操作,最终得到 XHTML MP 页面,完成了 Web 页面到手机页面的转换。通过实验,验证了整个页面适配过程和方法的可行性。%In this thesis, a conversion mechanism which adapted the Internet page to mobile page is designed and implemented, a webpage method based on tree structure analysis is proposed, and the design of system algorithm with its C++implementation is introduced. The Internet page adaptation method in this paper creates the corresponding document model tree structure of Internet page firstly, and removes the page noise, adapts Frameset/Iframe, paginates, restricts, deals with intelligent caching and multi-language character sets supported operating, finally gets the XHTML MP page. The feasibility of the process and methods for internet page adaptation is verified by a series of experiments.

  13. 基于DOM模型扩展的Web信息提取%Extraction of Information from Web Pages Based on Extended DOM Tree

    Institute of Scientific and Technical Information of China (English)

    顾韵华; 田伟

    2009-01-01

    A method of information extraction from Web pages was presented, and it is based on extended DOM tree.Web pages were firstly transformed to DOM tree, then the DOM tree was extended by adding semantic expression to node and influence degree was calculated for each node.According to influence degree of nodes, the DOM tree was pruned,and it can automatically extract the useful relevant content from Web pages.This approach is a universal me-thod,which does not require to pre-know the structure of the Web page.The results of the information extraction are used not only for browsing but also for further Web information process, such as internet data mining, topic-based search engine.%提出了一种基于DOM模型扩展的Web信息提取方法.将Web页面表示为DOM树结构,对DOM树结点进行语义扩展并计算其影响度因子,依据结点的影响度因子进行剪枝,进而提取Web页面信息内容.该方法不要求对网页的结构有预先认识,具有自动和通用的特点.提取结果除可以直接用于Web浏览外,还可用于互联网数据挖掘、基于主题的搜索引擎等应用中.

  14. Categorization and Extraction of Web Pages Based on Hierarchy%面向分层结构的网页分类与抓取

    Institute of Scientific and Technical Information of China (English)

    王振宇; 唐远华; 郭力

    2012-01-01

    Traditional web crawler provides services based on searching keywords. It cannot extract the categorization information of web pages, thus resulting in efficiency and accuracy problems on text clustering and topic detection. To solve this problem, a method of categorization and extraction of web pages based on hierarchy is proposed in this paper. By building a virtual hierarchy categorization tree and extracting the hierarchies of real web sites, a web page is categorized when it is crawled. For sites which have no categorization information, a page title based categorization algorithm is presented, including building up the domain knowledge base and calculating the semantic similarity based on Hownet. The experimental results demonstrate that this method achieves preferable effects.%传统网络爬虫为基于关键字检索的通用搜索引擎服务,无法抓取网页类别信息,给文本聚类和话题检测带来计算效率和准确度问题.本文提出基于站点分层结构的网页分类与抽取,通过构建虚拟站点层次分类树并抽取真实站点分层结构,设计并实现了面向分层结构的网页抓取;对于无分类信息的站点,给出了基于标题的网页分类技术,包括领域知识库构建和基于《知网》的词语语义相似度计算.实验结果表明,该方法具有良好的分类效果.

  15. 结合PCM聚类算法的网页排序%Web page ranking algorithm based on PCM clustering algorithm

    Institute of Scientific and Technical Information of China (English)

    刘发升; 张菊琴

    2013-01-01

    The paper proposed a page ranking algorithm based on PCM clustering algorithm in order to solve the problems that the topic relevance of search results are easily ignored and the topics are easily changed in the traditional page sorting algorithms. It improves the topic relevance of the search results and reduces the topic drift. Firstly, by inquiring a theme, random walk method (RWM) is used to calculate the two pages of the symmetrical social distance (SSD) between two web pages. Secondly, SSD and PCM clustering algorithm are used to cluster page and get each community of related topic, and obtain the probability of each member in every community group. Finally, according to the probability and recommended degree of the pages, the web pages are sorted. The experimental results show that, compared with the PageRank algorithm, the proposed page sorting algorithm based on PCM clustering algorithm can obtain a search result with more relevant topic. Because it targets a subject sort, the algorithm reduces the topic drift.%针对传统的网页排序算法中容易出现的忽略搜索结果主题相关性和主题漂移的问题,提出了结合PCM聚类算法的网页排序,用来提高搜索结果中网页主题的相关性并减少其主题漂移.首先,通过查询某个主题,运用随机行走(RWM)的方法来计算两个网页之间的对称社会距离(SSD);然后,用SSD和PCM聚类算法对网页进行聚类,得到相关主题的各个社区,通过计算得到各个社区中成员属于该社区的概率表示;最后,根据各社区成员的概率和网页的推荐度对网页进行排序.实验结果表明,与PageRank算法相比,该算法搜索结果中网页主题的相关程度更高;另外,由于是针对某个主题的排序,该算法减少了主题漂移.

  16. Etat de l'art des méthodes d'adaptation des pages Web en situation de handicap visuel

    OpenAIRE

    Bonavero, Yoann; Huchard, Marianne; Meynard, Michel; Waffo Kouhoué, Austin

    2016-01-01

    National audience; Cet article se consacre à l'étude des technologies, outils et travaux de recherche existant autour de l'accessibilité au Web pour les personnes en situation de handicap visuel. Nous commençons par décrire la structure d'une page Web et les différents standards et normalisations qui existent et permettent d'assurer une accessibilité minimale. Nous détaillons ensuite les diverses possibilités offertes par les technologies d'assistance les plus connues et les outils spécifique...

  17. Pragmatic Presupposition of the Chinese "Jingran" in Web Page Title%“竟然”网页标题的语用预设分析

    Institute of Scientific and Technical Information of China (English)

    陈丽婉

    2011-01-01

    The Chinese "Jingran",either frequently-used modality adverb or a presupposition trigger,often appears in web page titles.Based on the theory of pragmatic presupposition,this paper analyses the pragmatic function of "Jingran" in web page title including abnormality,economy,authority and focal prominence.%"竟然"是个使用频率很高的语气副词,也是常用的预设触发语,经常出现在网页标题中。文章从语用预设理论出发,分析了"竟然"网页标题的求异性、经济性、权威性和凸显性等语用功能。

  18. Using JavaScript to Realize Dynamic Switch of Web Page Images%用JavaScript实现Web页的动态切换图片

    Institute of Scientific and Technical Information of China (English)

    于万国

    2013-01-01

    利用JavaScript的脚本语言实现Web页中的动态图片切换设计。该设计实现了打开网页时图片既能自动切换,又能通过手动单击图片右下角的标号显示相应标号对应的图片,即手动显示图片。%JavaScript scripting language was used to realize the design of the dynamic switch of Web pages images.This design not only has realized automatic switch opening a Web pages,but also can click the label on the lower right corner of the image to display corresponding image,namely to manually dis-play images.

  19. 基于ASP实现WEB网页动态刷新技术%The Implementation of Web Page Dynamic Refresh Technology Based on ASP

    Institute of Scientific and Technical Information of China (English)

    邢筠

    2001-01-01

    本文讨论了ASP的工作模式和ASP文件的创建过程,介绍了数据库的存取以及基于ASP实现WEB网页动态刷新的技术。%This paper discusses the mode of ASP working and the process of ASP file creating.It also introduces the technology we use to access database,and the dynamic technology of refreshing web page based on ASP.

  20. Interpretation of GUI Visual Elements in Web Page Design%网页设计中GUI视觉系统元素的解读

    Institute of Scientific and Technical Information of China (English)

    员勃

    2012-01-01

    It analyzed the readability of character elements,responsibility value of graphic and function guiding of color elements in the GUI visual system in the web page design.Combined with some visual elements and visual analysis of excellent website at home and abroad in daily life,it discussed the systematic peculiar requests of GUI visual system within web page design category,and then analyzed the feasibility of characters,logo,color in the web page design.It suggested that the design element of GUI consumer interface design that must have definite readability and unity in the web page design,and apply the rational division of aesthetic feeling and color,as well as rationality of interactive process and linking process up.%分析了网页设计中GUI视觉系统中文字元素的可读性、图形图标的责任价值以及色彩元素的功能导向,并结合日常生活中的一些视觉元素、国内外优秀网站的视觉分析,论述了GUI视觉系统在网页设计范畴里的特殊要求,进而分析了文字、图标、色彩这些视觉元素在网页中进行设计的可行性。提出了GUI用户界面设计元素在网页设计中必须具有明确的可读性及统一性,以及运用一定的视觉美感、色彩的合理划分创造交互过程及沟通过程的合理性。

  1. 结合网页和语句共现的Web社会关系评估%Web Social Relation Evaluation Combining with Web Page Co-occurrence and Sentence Co-occurrence

    Institute of Scientific and Technical Information of China (English)

    尹美娟; 王清贤; 刘晓楠

    2012-01-01

    针对现有Web社会关系评估方法准确率较低的问题,提出一种结合网页与语句共现的Web社会关系评估方法.根据人名对在Web中的网页共现和页面内语句共现情况,综合评估2个人之间社会关系的强弱,设计相应的关系评估函数.实验结果表明,该评估方法能准确地度量Web社会关系的强弱,并且在度量关系权重时,语句共现比网页共现对评估结果的准确性起到更重要的决定作用.%As there are several problems for existing Web social relation evaluation methods whose accurate is not high and have poor stability, this paper proposes a Web social relation evaluation method based on both Web page co-occurrence and intra-page sentence co-occurrence. The method assigns weights to the relation between two persons by information about their names co-occurrence in Web pages and co-occurrence in sentences of Web pages. And two weighting functions are designed based on the proposed method. Experimental results show that the method can assign more appropriate weights to relations in Web social networks and the stability of the method is better, and statement co-occur is more important decision function than web co-occur when metric relations weight.

  2. A Research on the Old Age’s Reading and Web Page Layout Design%老年人阅读与网页版式设计研究

    Institute of Scientific and Technical Information of China (English)

    肖龙

    2015-01-01

    In the information age, the old people’ enthusiasm of computer reading are stronger. Improving the web page format design can improve the efficiency of web browsing and reading aesthetic experience of old people. By conducting questionnaire survey combined with empirical study of the physical and psychological behavior of the old age to examine the effect web page layout has on them, this thesis aims to investigate the interrelation between the physical characteristics of the old age and their reading, and hence to provide scientific basis for web page layout design. As the result of the research shows, the traditional elderly-oriented media layout and web page layout have both commonality and disparity in terms of design principles. Based on that, this thesis suggests that the efficiency of reading of the old age and the usability of websites can be improved by means of font designing, color matching and layout structure. The findings of this research can provide some reference for elderly-oriented web page layout design.%在信息化时代,老年人学习和依赖电脑阅读的热情越来越强烈。通过改善网页版式设计,有助于提升老年人的网页浏览效率和阅读审美体验。采用问卷调查、生理和心理行为实证等方法,测试网页版式设计对老年人阅读的影响,考察老年人生理特征与阅读之间的关系,为网页版式设计提供科学依据。研究发现,老年传统媒介版式与网页版式在设计原理上既有共通点,也有差异性。建议从字体设计、颜色匹配、版式结构等多方面入手,改善和提高老年人阅读的有效性和网站的可用性。

  3. Detection And Classification Of Web Robots With Honeypots

    Science.gov (United States)

    2016-03-01

    programs has been attributed to the explosion in content and user-generated social media on the Internet. The Web search engines like Google require...large numbers of automated bots on the Web to build their indexes. Furthermore, the growth of internet has produced a market for businesses, both...played an important role in its evolution and growth. Conversely, the “bad” Web robots have been and continue to be a significant problem. Bad Web robots

  4. Computing Principal Eigenvectors of Large Web Graphs: Algorithms and Accelerations Related to PageRank and HITS

    Science.gov (United States)

    Nagasinghe, Iranga

    2010-01-01

    This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and…

  5. Computing Principal Eigenvectors of Large Web Graphs: Algorithms and Accelerations Related to PageRank and HITS

    Science.gov (United States)

    Nagasinghe, Iranga

    2010-01-01

    This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and…

  6. Algorithm Study of Information Conceal Based on Many Web Pages%基于多网页信息隐藏算法研究

    Institute of Scientific and Technical Information of China (English)

    孙利; 张得生; 陈萍

    2011-01-01

    利用网页来传递秘密信息,为了保护信息不被攻击,针对现有的网页信息隐藏技术隐藏量较少、鲁棒性差和隐蔽性差等缺陷,本文提出了一种新的结合多网页隐藏信息的方法,将隐藏信息以二值图像应用于网页隐藏中.实验证明,该方法较好的隐藏性和安全性,具有较高的使用价值.%For keeping information from attack, the article aims at defects of existing web page information hiding technology such as small hidden information capacity, poor robustness, and poor elusive. The article put forward a new method of hiding information based on many web pages, it hided the information in web pages by binary image. This method has good hidden and safe performances and higher applied value as proved by the experiments.

  7. Research on Asp.net Web page by value%Asp.net Web页面之间传值问题研究

    Institute of Scientific and Technical Information of China (English)

    吴平贵

    2011-01-01

    在Asp.net程序中,各个Web页面是相互孤立的,信息不能进行传递,如何高效地交换数据,是一个值得研究的问题。Web页面之间传值的方法较多,但多数对性能消耗较大,利用Microsoft Visual Studio 2010开发平台,精选出三种性能高效的传值方法。%In the asp.net application,each Web page are mutually isolated,information can not be passed,how to exchange data efficiently is a problem worthy of study.Web page by value between the many methods,but most of the performance of large consumption,the article used Microsoft Visual Studio 2010 development platform,the selection of three performance-value cost-effective method,with the reader by the idea of learning by doing,Aspnet Web page so that you learn to pass the value between the methods.

  8. 基于内容的网页特征提取%Research on Feature Extraction for Content-based Chinese Web Pages Analysis

    Institute of Scientific and Technical Information of China (English)

    张义忠; 赵明生; 朱精南

    2001-01-01

    文章主要研究基于内容的中文网页的特征提取技术,具体介绍了分词词典的建造方法,网页正文、标记信息和超链信息的特征提取。通过对旅游类网页的实验结果显示,文中的方法和应用效果良好。%This paper presents a feature framework for content-based Chinese web page analysis and searching. The method for constructing segmentation keyword dictionary is introduced first. The keywords in the dictionary are these words that represent the contents and concepts of a certain are web pages. Then,feature extraction methods for text,tag information and hyperlink information are addressed. Experiments have shown that the proposed methods tested on Chinese travel web pages are worked very well.

  9. On Page Rank

    NARCIS (Netherlands)

    Hoede, C.

    2008-01-01

    In this paper the concept of page rank for the world wide web is discussed. The possibility of describing the distribution of page rank by an exponential law is considered. It is shown that the concept is essentially equal to that of status score, a centrality measure discussed already in 1953 by Ka

  10. The effect of new links on Google PageRank

    NARCIS (Netherlands)

    Avrachenkov, Konstatin; Litvak, Nelly

    2004-01-01

    PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. We study the effect of newly created links on Google PageRank. We discuss to wh

  11. 基于页面分析的Web安全监测与报警系统%Web security monitoring and alarm system based on page analysis

    Institute of Scientific and Technical Information of China (English)

    王宇; 刘军; 李源; 王兴伟

    2011-01-01

    Various information services in our social life offer services base on Web techniques, and research and application about Web security is one of the main topics in security area. In this article , we propose a Web security monitoring technology based on page analysis and build an architecture of Web security monitoring and alarm system based on page analysis.%社会生活中的各类信息服务已经大部分是通过基于Web技术的信息系统的方式对外提供,其系统安全研究和应用已经是安全领域主要课题之一.本文提出一种基于页面分析的Web信息系统漏洞检测与状态监测技术,并基于该技术构建了可用的Web信息系统安全监测与报警系统.

  12. WEB CONTENT EXTRACTION USING HYBRID APPROACH

    Directory of Open Access Journals (Sweden)

    K. Nethra

    2014-01-01

    Full Text Available The World Wide Web has rich source of voluminous and heterogeneous information which continues to expand in size and complexity. Many Web pages are unstructured and semi-structured, so it consists of noisy information like advertisement, links, headers, footers etc. This noisy information makes extraction of Web content tedious. Many techniques that were proposed for Web content extraction are based on automatic extraction and hand crafted rule generation. Automatic extraction technique is done through Web page segmentation, but it increases the time complexity. Hand crafted rule generation uses string manipulation function for rule generation, but generating those rules is very difficult. A hybrid approach is proposed to extract main content from Web pages. A HTML Web page is converted to DOM tree and features are extracted and with the extracted features, rules are generated. Decision tree classification and Naïve Bayes classification are machine learning methods used for rules generation. By using the rules, noisy part in the Web page is discarded and informative content in the Web page is extracted. The performance of both decision tree classification and Naïve Bayes classification are measured with metrics like precision, recall, F-measure and accuracy.

  13. 高校门户网站的页面布局技术研究%Research on Web page layout of the university portal website

    Institute of Scientific and Technical Information of China (English)

    蓝鹰

    2016-01-01

    With the development of the times, the university portal website plays a more and more important role in the daily recruitment、employment、and teaching process. The appearance of the university portal website is directly determined by the quality of the Web page layout. Scientific and reasonable layout of pages may not only improve the page display effect, but also can greatly improve the download speed of the pages and enhance the satisfaction of website visitors.%随着时代的发展,高校门户网站在日常招生、就业、教学过程中起到越来越重要的作用。高校门户网站页面布局的好坏直接决定着门户网站的美观程度。科学合理的页面布局不仅能够提高网页的显示效果,而且能够极大地提高网页的下载速度,提升网站访问者体验的满意度。

  14. Research and Implementation of the Web Page Generation System Based on Responsive Web Design%基于响应式Web设计的网页生成系统研究与实现

    Institute of Scientific and Technical Information of China (English)

    臧进进; 鄂海红

    2015-01-01

    With the rise of the mobile Internet, more and more people start using mobile devices to access various sites. Web pages that can fit different terminals become the key for individuals and enterprises to design and develop the web. In this paper, the author first discusses the responsive Web design-related technology, then designs and implements a new web page generation system. The system can shield the technical details of the development of the web page, so that users can create the web page in the way of "what you see is what you get". Through the use of response Web de-sign, the pages generated through this system can automatic response, dynamic adjustment of the layout of the structure, interaction style depending on the access device. Finally, the same content presented in different formats for different devices users.%随着移动互联网的兴起,越来越多的人开始使用移动设备访问各类网站。制作可以适配不同终端的网页成为了个人和企业网站设计和开发的关键。本文在论述了响应式Web设计相关技术的基础上,设计并实现了一套新型网页生成系统。该系统通过响应式Web设计的开发方式,为用户屏蔽网页开发的技术细节,让用户能够以“所见即所得”的方式创建网页。使用该系统生成的网页能随着接入设备的不同而自行响应,动态地调整布局结构、交互样式,将相同的内容以不同的格式呈现给不同的设备用户。

  15. 涂鸦元素在网页设计中的应用研究%Study on the Application of Graffiti Element in Web Page Design

    Institute of Scientific and Technical Information of China (English)

    曹晏祯

    2015-01-01

    本文通过分析涂鸦元素在部分网页中的应用使大家认识到涂鸦元素强烈的视觉表现力更容易吸引网页的浏览者,获得商业和艺术的共同价值。让人们明白涂鸦本身就在商业视觉元素中,具有能够产生或表达联想的商品功能,这种风格的图形可以延长作为网页设计元素的形象,以满足现代年轻消费群体的眼睛需求。%Based on analyzing the application of graffiti element in some web pages, this paper is written to enhance people's awareness of the strong visual expression of graffiti which is more attractive to the web visitors to achieve both commercial and art values. As graffiti itself is a strong commercial visual element, it can bring about or remind people of the product's functions. This kind of graphic is able to extend the image of the web page design element so as to feast the eyes of the young customers.

  16. 基于集成学习的网页主题识别算法%Web Page Topic Recognition Algorithm based on Ensemble Learning

    Institute of Scientific and Technical Information of China (English)

    葛东谋; 张钢; 李谦

    2013-01-01

    在海量网页中进行自动的主题识别是网页信息分析挖掘的重要研究方向,具有重要的理论和应用意义。提出一种基于集成学习的网页主题识别算法框架,由异质网页属性集构建不同的最大间隔分类器,使用集成学习对基分类器的信息进行融合。在基准数据集上进行测试,其结果表明该算法对网页主题识别是有效的。%Topic automatic recognition in large amount of web pages is an important research subfield in web-based information analysis and mining.It has important both theoretical and applicable sense.This paper proposed a web page topic recognition algorithm framework based on ensemble learning.It constructed indi-vidual base learners through heterogeneous properties set in a largest separation margin meaning,and ap-plied ensemble learning to combine the results from individual learners.The proposed algorithm is evaluated on a benchmark data and the results illustrates its effectiveness.

  17. Server-side Dynamic Web Page Technique: JSP+Servlet%服务器端动态网页技术--JSP+Servlet

    Institute of Scientific and Technical Information of China (English)

    扬黎明; 董传良; 董玮文

    2001-01-01

    介绍了一种新的、根植于Java跨平台特性的服务器端新技术:Servlet+JSP.它提供了一种灵活、方便,通用的方式来创建动态网页;将HTML编码与Web页面的业务逻辑有效分离;可以高效地通过JDBC与数据库通信.在上海旅委网站建设中采用了该技术进行开发.%he topic introduces a new server-side technique: Servlet and JSP based on the cross-platform characteristic of Java. The technique provides a convenient, agility and universal method to create dynamic Web page, separates the program coding of HTML from the business logic of Web page, and uses JDBC to communicate with database effectively. In the topic we also explain the way to design the Servlet and JSP through the building of the Web site of Shanghai Tour Business Administration Committee.

  18. WEB LINK SPAM IDENTIFICATION INSPIRED BY ARTIFICIAL IMMUNE SYSTEM AND THE IMPACT OF TPP-FCA FEATURE SELECTION ON SPAM CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    S. K. Jayanthi

    2013-10-01

    Full Text Available Search engines are the doorsteps for retrieving required information from the web. Web spam is a bad method for improving the ranking and visibility of the web pages in search engine results. This paper addresses the problem of the link spam classification through the features of the web sites. Link related features retrieved from the website are used to discriminate the spam and non-spam sites. AIS inspired algorithms are applied for the dataset and results are evaluated. Artificial immune systems are machine learning systems inspired by the principles of the natural immunology. It comprises of supervised learning schemes which can be adapted for the wide range of the classification problems.UK- WEBSPAM-2007 Dataset [8] is used for the experiments. WEKA [9] is used to simulate the classifiers. Artificial Immune Recognition algorithm seems to perform well than the other classes. Best classification accuracy attained is 98.89 by AIRS1 Algorithm. This seems to be good when comparing with the other classifiers accuracy available on the existing literature.

  19. The influence of quality criteria on parents' evaluation of medical web-pages: an Italian randomised trial.

    Science.gov (United States)

    Currò, Vincenzo; Buonuomo, Paola Sabrina; Zambiano, Annaluce; Vituzzi, Andrea; Onesimo, Roberta; D'Atri, Alessandro

    2007-01-01

    The aim of this study is to verify the usefulness for parents of a web evaluation framework composed of ten quality criteria to improve their ability to assess the quality level of medical web sites. We conducted a randomised controlled trial that included two groups of parents who independently evaluated five paediatric web sites by filling out two distinct questionnaires: group A with the evaluation framework, group B without it. 40 volunteers were recruited from parents referring to the General Paediatrics Out-patients Department who satisfied the following eligibility criteria: Internet users, at least 1 child under 12 months old, no professional skill in Internet and medicine. The survey was taken between February 2, 2000 and March 22, 2000. Parents evaluated each web site and assigned a score, compared with a gold standard created by a group of experts. Suggesting evaluation criteria to parents seem useful for an improvement of their ability to evaluate web sites.

  20. WEB CONTENT-BASED SELF-ADAPTIVE PAGE TRANSFORMATION AGENT%基于Web内容的自适应页面转换助理

    Institute of Scientific and Technical Information of China (English)

    沈向峰; 林守勋; 黄铁军

    2001-01-01

    According to non-PC network terminal device Internet providing, such as TV and mobile computing device, we putforward the basic idea of transformation agent. We designed the Web content-based self-adaptive page transformation agent, whichcan adaptively provide corresponding Interoet page according to the request of different network terminal device.%针对电视机(机顶盒)和移动计算设备等非PC网络终端设备上网,提出了转换助理的基本思想,设计与实现了基于Web内容的自适应页面转换助理,能够自适应地针对不同网络终端设备的请求提供相应的因特网页面,方便用户浏览因特网上的内容。

  1. Study and Implementation of Web Mining Classification Algorithm Based on Building Tree of Detection Class Threshold

    Institute of Scientific and Technical Information of China (English)

    CHEN Jun-jie; SONG Han-tao; LU Yu-chang

    2005-01-01

    A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4. 5 algorithm, the disadvantage of excessive adaptation in C4. 5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.

  2. 一种基于统计特征面向“类型”主题抓取的两页相关性判断策略研究%Statistical Characteristics Based Web Page Relevance Judgment Strategy for the "Type" Topics Crawled

    Institute of Scientific and Technical Information of China (English)

    乔建忠

    2012-01-01

    This paper proposes a new Web page type relevance judgment strategy based on several statistical characteris- tics of Web document types to meet the online classification lightweight design requirements of focused crawler. Using the API provided by WEKA, this paper devises appropriate training algorithm and classification algorithm for the relevance judgment strategy. of the relevance ju The experiments of classification accuracy, efficiency, and attribute selection demonstrate the validity dgment strategy and five Web page statistical characteristics playing a key role in the type identification.%为满足主题爬行器在线分类的轻量化设计要求,提出一种基于多项表示网络文档类型的统计特征实现网页按类型进行主题相关性判断的策略;借助WEKA提供的API,为该主题相关性判断策略设计相应的训练算法和分类算法。通过分类准确率、效率和特征选择实验,证明该主题相关性判断策略的有效性以及5项对类型识别起关键作用的统计特征。

  3. 青年人与老年人网页颜色偏爱研究%Web Page Color Preference of the Young and the Aged

    Institute of Scientific and Technical Information of China (English)

    李宏汀; 王平飞

    2012-01-01

    Nowadays more and more elderly people are getting information by surfing the Internet.Previous studies have shown that color preference changes with ageing.However,most of web pages are designed according to the color preference of young adults with out consideration of the elderly,though significant difference exist.Although a lot of research has investigated the difference of the abstract color preference and found significant sex and age-related difference in color preference,their results are too ambiguous to be taken into account directly in web page design for the elderly partly because abstract color preference is not the same as specific color.As it is known,color preference is a psychological factor gready affecting the usability and the aesthetics of the web page. This paper aimed to investigate the web page' s specific color preference of the young and the elderly.One hundred and ninetyeight participants took part in the research.The participants who suffered from illnesses such as Alzheimer's,Parkinsons or diabetic retinopathy were excluded from the study.The paired-comparison method was employed in the research.This research contained two experiments. The first experiment investigated the text and theme color preference of the young and the old,which served as an abstract color preference investigation.In the theme color preference test,the stimulation materials were chosen from the 48 web-safe colors in Ms Paint program.Finally,each group chose 6 most preferred colors.When the experiment started,the participants were asked to make paired-comparison between 6 colors chosen before.In the text color preference test,4 colors(red,blue,black and green) were chosen as the stimulation material.Experiment 1 described the order of the theme color and the text color after 42 times of pair-comparison. Based on the results of Experiment 1,18 web pages(combined by 6 theme colors and 3 text colors) were designed.Experiment 2 was carried out to investigate the

  4. The sensitivity analysis based on Web page ranking algorithm%基于网页排名算法的敏感性分析

    Institute of Scientific and Technical Information of China (English)

    马莉娜; 张家健; 李立维

    2016-01-01

    The sensitivity analysis of Web page ranking algorithm can achieve a further understanding on principle and condition scope of the score of welcome degree which the algorithm model given. Different changes in parameters will result in different degrees of sensitivity. Based on this problem, this article discusses the sensitivity of PageRank and HITS by mathematical content analysis of the algorithm. On the basis of analyzing the dependence of matrix G on the ratio parameterα, hyperlink matrix H and personalized vector νT, this article analyzes the influence of three specific parameters on PageRank vector. Finally, it considers the sensitivity of HITS.%对网页排名算法的敏感性分析,能够进一步了解关于算法模型所给出的欢迎度评分的原理和条件范围。基于参数的不同变化会导致不同程度的敏感性这一问题,本文通过对算法的数学内容分析,研究PageRank和HITS的敏感性问题。在分析矩阵G对于比例参数α、超链接矩阵H和个性化向量vT的依赖性的基础上,分析了3个特定参数对PageRank向量的影响,最后,对HITS的敏感性进行分析。

  5. The Convergent Evolution of a Chemistry Project: Using Laboratory Posters as a Platform for Web Page Construction.

    Science.gov (United States)

    Rigeman, Sally

    1998-01-01

    Argues that evolution is a process that occurs within the curriculum as well as within the physical universe. Provides an example that involves student presentations. Discusses the transition from poster presentations to electronic presentations via the World Wide Web. (DDR)

  6. 基于支持向量机与无监督聚类相结合的中文网页分类器%A Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering

    Institute of Scientific and Technical Information of China (English)

    李晓黎; 刘继敏; 史忠植

    2001-01-01

    This paper presents a new algorithm that combines Support VectorMachine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.%提出了一种将支持向量机与无监督聚类相结合的新分类算法,给出了一种新的网页表示方法并应用于网页分类问题.该算法首先利用无监督聚类分别对训练集中正例和反例聚类,然后挑选一些例子训练SVM并获得SVM分类器.任何网页可以通过比较其与聚类中心的距离决定采用无监督聚类方法或SVM分类器进行分类.该算法充分利用了SVM准确率高与无监督聚类速度快的优点.实验表明它不仅具有较高的训练效率,而且有很高的精确度.

  7. Using PHP to Parse eBook Resources from Drupal 6 to Populate a Mobile Web Page

    Directory of Open Access Journals (Sweden)

    Junior Tidal

    2012-10-01

    Full Text Available The Ursula C. Schwerin library needed to create a page for its mobile website devoted to subscribed eBooks. These resources, however, were only available through the main desktop website. These resources were organized using the Drupal 6 content management system with contributed and core modules. It was necessary to create a solution to retrieve the eBook databases from the Drupal installation to a separate mobile site.

  8. 基于Web标准实现四种基本页面布局%Implementation of Four Kinds of Basic Page Layout Based on Web Standards

    Institute of Scientific and Technical Information of China (English)

    包乌格德勒; 李娟

    2012-01-01

    近年来随着网站制作技术的发展,Web标准已经成为网页设计与开发的一个趋势。但是在实际应用中,有时需要根据语言的文字书写习惯来设计页面布局。针对这一问题,介绍使用DIV+CSS布局技术实现四种基本页面布局的过程,同时给出相关方法和步骤。%Along with the recent development in Website production technology, Web standards has al- ready become a trend of Web design and development. However, in practical application, sometimes need to design a page layout based on the text writing mode of the language. To solve this problem, introduces the process of realizing four kinds of basic page layout with DIV+CSS layout technology, and gives the corresponding methods and steps.

  9. Segmentation and Extraction for Social Media Web Page Content%社会媒体网页内容的分割与抽取

    Institute of Scientific and Technical Information of China (English)

    解姝; 叶施仁; 肖春

    2011-01-01

    为实现社会媒体网页内容的分割与抽取,利用k-means算法识别出页面的频繁块并形成一个频繁簇集合,找出该集合中的主题频繁簇,对其中的频繁块结构进行自学习,无需训练样本,即可自动生成抽取规则.实验结果表明,该方法能抽取各种风格的社会媒体网页内容,具有较高的准确率和召回率.%This paper presents a segmentation and extraction method which does not need any hand-crafted rules and training examples for content-rich pages in social media. It identifies the frequent blocks in page by using k-means algorithm and obtains a collection of frequent clusters. It identifies the topic frequent clusters and induces extraction rules from the frequent blocks in topic frequent clusters through self-supervised approach. Experimental results show that it is efficient and robust for social media Web pages with various styles and layouts with high precision and recall rate.

  10. Searching the World Wide Web: How To Find the Material You Want on the Multimedia Pages of the Internet.

    Science.gov (United States)

    Turner, Mark

    1997-01-01

    Highlights some popular search engines and presents guidelines on making queries, narrowing a search, using quotation marks and how and when to used advanced searches. Discusses special search tools for World Wide Web and CD-ROM products and homework assistance software. Lists the network locations of five popular search engines. (AEF)

  11. Technique of Active Server Page Access to Web Database%ASP技术访问WEB数据库

    Institute of Scientific and Technical Information of China (English)

    洪运锡

    2006-01-01

    作者阐述了Windows操作系统中ⅡS(Internet Information Server)内含的ASP(Active Server Page)和ADO(ActiveX Data Objects)技术的结构和特点,总结了采用ASP技术访问Web数据库的步骤.ASP编写前台动态网页,它采用封装对象,程序调用对象的技术,简化编程,加强程序间合作,通过ADO访问后台WEB数据库.

  12. Using Web-Based Key Character and Classification Instruction for Teaching Undergraduate Students Insect Identification

    Science.gov (United States)

    Golick, Douglas A.; Heng-Moss, Tiffany M.; Steckelberg, Allen L.; Brooks, David. W.; Higley, Leon G.; Fowler, David

    2013-01-01

    The purpose of the study was to determine whether undergraduate students receiving web-based instruction based on traditional, key character, or classification instruction differed in their performance of insect identification tasks. All groups showed a significant improvement in insect identifications on pre- and post-two-dimensional picture…

  13. NLSDF FOR BOOSTING THE RECITAL OF WEB SPAMDEXING CLASSIFICATION

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2016-10-01

    Full Text Available Spamdexing is the art of black hat SEO. Features which are more influential for high rank and visibility are manipulated for the SEO task. The motivation behind the work is utilizing the state of the art Website optimization features to enhance the performance of spamdexing detection. Features which play a focal role in current SEO strategies show a significant deviation for spam and non spam samples. This paper proposes 44 features named as NLSDF (New Link Spamdexing Detection Features. Social media creates an impact in search engine results ranking. Features pertaining to the social media were incorporated with the NLSDF features to boost the recital of the spamdexing classification. The NLSDF features with 44 attributes along with 5 social media features boost the classification performance of the WEBSPAM-UK 2007 dataset. The one tailed paired t-test with 95% confidence, performed on the AUC values of the learning models shows significance of the NLSDF.

  14. Topic crawling strategies based on Wikipedia and analysis of web-page similarity%基于维基百科和网页相似度分析的主题爬行策略

    Institute of Scientific and Technical Information of China (English)

    栾霞; 赵晓楠

    2014-01-01

    To overcome the weakness existing in the present topic crawling strategies,a topic crawling strategy based on Wikipedia and web-page similarity analysis is put forward in this paper. The Wikipedia classification tree structure is utilized to describe the topics,and then the downloaded webs are properly handled. Finally,the priorities of the candidate links are calcu-lated in combination with text relativity and analysis of Web links. The experimental result indicates that this new method is bet-ter than the traditional crawler in terms of searching results and topic relativity,and its climb rate has been increased. The theme description method and the crawl strategy have a certain promotion value,especially in the field of genetically modified or-ganisms,the crawler has certain innovativeness.%针对当前常用爬虫爬行策略的不足,提出结合维基百科和网页相似度分析的主题爬行策略。利用维基百科分类树的结构对主题进行描述;下载网页后对网页进行相应处理,结合文本相关性和Web链接分析来计算候选链接的优先级。实验表明,该爬虫搜索结果与主题相关度明显高于传统爬虫,爬虫爬全率有一定提高。该主题爬虫主题描述方法和爬行策略有一定的推广价值,尤其在转基因生物领域中,该爬虫中有一定的创新性。

  15. Page Layout Analysis of the Document Image Based on the Region Classification in a Decision Hierarchical Structure

    Directory of Open Access Journals (Sweden)

    Hossein Pourghassem

    2010-10-01

    Full Text Available The conversion of document image to its electronic version is a very important problem in the saving, searching and retrieval application in the official automation system. For this purpose, analysis of the document image is necessary. In this paper, a hierarchical classification structure based on a two-stage segmentation algorithm is proposed. In this structure, image is segmented using the proposed two-stage segmentation algorithm. Then, the type of the image regions such as document and non-document image is determined using multiple classifiers in the hierarchical classification structure. The proposed segmentation algorithm uses two algorithms based on wavelet transform and thresholding. Texture features such as correlation, homogeneity and entropy that extracted from co-occurrenc matrix and also two new features based on wavelet transform are used to classifiy and lable the regions of the image. The hierarchical classifier is consisted of two Multilayer Perceptron (MLP classifiers and a Support Vector Machine (SVM classifier. The proposed algorithm is evaluated on a database consisting of document and non-document images that provides from Internet. The experimental results show the efficiency of the proposed approach in the region segmentation and classification. The proposed algorithm provides accuracy rate of 97.5% on classification of the regions.

  16. 一种新的用于数据挖掘工具的网页净化算法%An new algorithm of Web page purification for data mining tools

    Institute of Scientific and Technical Information of China (English)

    孙楠; 张华伟

    2011-01-01

    In order to eliminate noise preferably and extract topic content from Web pages efficiently,an algorithm of Web page purification is presented. This algorithm argues that topic content of Web page is mainly contained in and ,hereby Web noise can be preprocessed. Then with the content match of relevant Web page, the topic content of Web page can be acquired by way of calculating the importance of node. This algorithm has achieved very precise results, correctly extracting 98.2% of the pages in a set of 6 318 pages in portal sites. When used for data mining tools, this algorithm is better than the other similar algorithms. It can eliminate noise efficiently.%为了更好地消除网页噪声,有效地提取网页的主题内容,提出了一种新的网页净化算法.该算法认为网页的主题内容主要包含在< table>标记和<p>标记里面,并据此对网页噪声进行预处理,然后与相关网页进行内容匹配,通过计算节点重要度,获取网页的主题内容.对门户网站的6318个网页的检测表明,该算法可以有效地提取网页的主题内容,准确率达到98.2%以上.用于数据挖掘工具时,该算法优于其他同类算法,可以有效地去除网页噪声.

  17. Measurment of Web Usability: Web Page of Hacettepe University Department of Information Management Web Sayfası Kullanılabilirliğinin Ölçülmesi: Hacettepe Üniversitesi Bilgi ve Belge Yönetimi Bölümü Web Sayfası Örneği

    Directory of Open Access Journals (Sweden)

    Nazan Özenç Uçak

    2009-06-01

    Full Text Available Today, information is produced increasingly in electronic form and retrieval of information is provided via web pages. As a result of the rise of the number of web pages, many of them seem to comprise similar contents but different designs. In this respect, presenting information over the web pages according to user expectations and specifi cations is important in terms of effective usage of information. This study provides an insight about web usability studies that are executed for measuring effectiveness of web pages in design and usage sections. It also puts emphasis on the usability study of Hacettepe University, Department of Information Management web page. Seven volunteered users are determined from different levels. Qualitative and quantitative methods were used together for usability study. This usability study consists of three stages. In the fi rst stage, “pre-test” was implemented for determining users’ general computer and Internet skills. In the second stage, “classical usability study” was implemented. Fourteen questions that require the usage of the web page were prepared and directed to users. Clicking number and retrieval time was recorded and think-aloud protocol was also implemented during the study. In the third stage, a fi nal test was implemented to users in order to fi nd out the ideas of users about Hacettepe University, Department of Information Management web page. As a result of the usability study, positive and negative features of the web page were determined by the analyses of qualitative and quantitative data. Finally the results that were elaborated from the analyses have been used for revision that will provide an effective usage of the web page. Günümüzde bilgi giderek daha çok elektronik ortamda üretilmekte ve erişimi de web sayfaları aracılığıyla sağlanmaktadır. Internet’te yer alan web sayfası sayısının artması sonucunda benzer içeriğe sahip ancak farklı tasarımlarla haz

  18. Event based classification of Web 2.0 text streams

    CERN Document Server

    Bauer, Andreas

    2012-01-01

    Web 2.0 applications like Twitter or Facebook create a continuous stream of information. This demands new ways of analysis in order to offer insight into this stream right at the moment of the creation of the information, because lots of this data is only relevant within a short period of time. To address this problem real time search engines have recently received increased attention. They take into account the continuous flow of information differently than traditional web search by incorporating temporal and social features, that describe the context of the information during its creation. Standard approaches where data first get stored and then is processed from a peristent storage suffer from latency. We want to address the fluent and rapid nature of text stream by providing an event based approach that analyses directly the stream of information. In a first step we want to define the difference between real time search and traditional search to clarify the demands in modern text filtering. In a second s...

  19. The STRESA (storage of reactor safety) database (Web page: http://asa2.jrc.it/stresa)

    Energy Technology Data Exchange (ETDEWEB)

    Annunziato, A.; Addabbo, C.; Brewka, W. [Joint Research Centre, Commission of the European Communities, Ispra (Italy)

    2001-07-01

    A considerable amount of resources has been devoted at the international level during the last few decades, to the generation of experimental databases in order to provide reference information for the understanding of reactor safety relevant phenomenologies and for the development and/or assessment of related computational methodologies. The extent to which these databases are preserved and can be accessed and retrieved is an issue of major concern. This paper provides an outline of the JRC databases preservation initiative and a description of the supporting web-based computer platform STRESA. (author)

  20. 面向网页文本的地理要素变化检测%Change Detection of Geographic Features Based on Web Pages

    Institute of Scientific and Technical Information of China (English)

    王曙; 吉雷静; 张雪英; 赵仁亮; 陈晓丹; 余浩

    2013-01-01

    地理要素变化检测已成为国家地理信息“十二五”规划和全国地理国情普查的重要组成部分。网页文本中蕴含海量的地理要素信息,尤其是新闻、政府、社交平台等网站的网页文本更新频繁,可为地理要素变化检测提供现势性的数据源。本文针对网页文本中地理要素变化的语言描述特点,构建了表达地理要素变化的语义知识库,设计了搜索引擎和通用主题相结合的网页爬虫,实现了相关网页文本的高效获取;采用规则模型和条件随机场模型,分别进行网页文本中地理要素变化信息抽取,包括地理要素名称、位置(地名)、时间和属性等。实验结果显示,本文设计的网页爬虫具有较高的相关网页文本获取能力,地理要素变化信息抽取的准确率能够达到70%以上,但是,语义知识库的完备程度对于信息抽取性能具有较大影响。研究成果表明,以网页文本为数据源的地理要素变化信息获取方法,能提供一种快速检测地理要素变化的新途径,与实地调绘和遥感影像检测等方法结合应用具有较好的优势互补性,可作为有力的辅助手段解决地理要素的持续更新和实时更新问题。%Geographic features change detection has became a vital component of the national geographical in-formation 12th Five-Year-Plan and the national geographic general survey. In web pages, billions of geographic feature changes were contained, especially in government official websites, news homepages, social portals and etc. The web pages of these websites update frequently, which could provide the latest data for geographic infor-mation change detection. Considering the complex characteristics of the web geographic information description, this paper did some valuable achievements. First of all, the geographic information knowledge base was estab-lished by summarizing the geographic information words

  1. 常用网页布局对比研究%Comparative study on commonly used Web page layout

    Institute of Scientific and Technical Information of China (English)

    郭军军

    2012-01-01

    The paper firstly analyzes the advantages and disadvantages about the table layout and the DIV + CSS layout. Then, it formed the table layout and DIV + CSS layout for example layout two webpage, compared the advantages and disadvantages about the two layouts. By comparing we found that DIV + CSS layout has more flexibility and webpage access speed than the table layout. For beginners and small site can use the table layout; tut for large sites and conforms to the Web2. 0 specification recommended sites using the DIV + CSS layout.%在分析比较了表格布局和DIV+ CSS布局的优缺点后,分别以表格布局和DIV+CSS布局为例布局了两个网页.并在布局的过程中比较两种布局的优缺点.通过比较发现使用DIV+CSS布局在布局灵活性和网页访问速度方面要优于表格布局.对于初学者和小型网站可以用表格布局;但对于大型网站和符合Web2.0技术规范网站推荐用DIV+CSS布局.

  2. Users page feedback

    CERN Multimedia

    2010-01-01

    In October last year the Communication Group proposed an interim redesign of the users’ web pages in order to improve the visibility of key news items, events and announcements to the CERN community. The proposed update to the users' page (right), and the current version (left, behind) This proposed redesign was seen as a small step on the way to much wider reforms of the CERN web landscape proposed in the group’s web communication plan.   The results are available here. Some of the key points: - the balance between news / events / announcements and access to links on the users’ pages was not right - many people asked to see a reversal of the order so that links appeared first, news/events/announcements last; - many people felt that we should keep the primary function of the users’ pages as an index to other CERN websites; - many people found the sections of the front page to be poorly delineated; - people do not like scrolling; - there were performance...

  3. VB Program Production Web Page Address Connector%VB程序制作网页地址连接器

    Institute of Scientific and Technical Information of China (English)

    张芸棕

    2011-01-01

    This paper describes the use of the VB language more intuitive to write a small program to generate linker web address bar address,in order to thoroughly understand the concept and understanding of the VB object-oriented programming approach,the text used in a lot of built functions.%本文主要介绍用比较直观的VB语言来编写小程序,生成网页地址栏地址链接器,以此来透彻的了解VB的概念以及了解面向对象的程序设计方法,丈中用到了很多自带函数。

  4. Approach of Eliminating Web Page Noise Based on Statistical Characteristics and DOM tree%一种基于统计学特征和DOM树的网页去噪技术

    Institute of Scientific and Technical Information of China (English)

    何友全; 徐澄; 徐小乐; 唐华姣

    2011-01-01

    针对特定的网站或网页中抽取出用户感兴趣的信息这一问题,分析现有去噪技术的优缺点,提出了一种基于统计学特征和DOM树的Web页面去噪方法.该方法首先对原始网页进行预处理,然后分析网页的统计学特征,结合启发式的抽取规则,对网页进行去噪.实验证实该方法在较少人为干预的基础上能达到较好的抽取效果.%In view of extracting the user interested information from specific websites or web pages, this paper proposes an approach of eliminating web page noise based on statistical characteristics and DOM tree after analyzing the advantages and disadvantages of existing web page noise eliminating algorithms. After pre-processing to the original pages, the approach analyzes their statistical characteristics combining with heuristic extraction rules to remove the noise in the web pages. Experiment shows that the approach achieves better retrieval results with relatively little human intervention.

  5. Research on teaching reformation of Web site design and Web page making%网站设计与网页制作课程的教学改革研究与实践

    Institute of Scientific and Technical Information of China (English)

    张琰

    2012-01-01

    网站设计与网页制作是电子商务专业必修的一门基础课程。该课程仍旧停留在动态ASP网站的理论与实践教学阶段,与当今最新的网站开发技术严重脱节。针对以上问题提出了利用Dreamweaver,Photoshop,Flash及PHP开发网站的教学内容改革研究与实践方案。%Web site design and Web page making is one of the fundamental courses for the major electronic commerce.But the teaching content and method of this course still stay in ASP Web site design and making which means the content and method of teaching are out of style.So we need to teach new content such as using Dreamweaver,Photoshop,Flash and PHP to make web site.

  6. 基于机器学习的批量网页篡改检测方法%Tamper detection of numerous web pages based on machine learning

    Institute of Scientific and Technical Information of China (English)

    赖清楠; 陈诗洋; 马皓; 张蓓

    2016-01-01

    A numerous web pages tamper detection method was designed to cope with page tamper problem .All the registered websites of a comprehensive university were studied ,and all the data in home pages were crawled and classified ,corresponding detection rules were built ,and an overall judg‐ment was given for each pages .The proposed method included learning and detecting phase .Each de‐tector standard value was trained from web pages history information in learning phase .In detecting phase ,each parameter was detected ,the detectors′ output were gathered ,the results were shown , and the website administrator was notified to confirm immediately if a webpage was detected be tampe‐red with ,and the system retrained to modify parameters when it was a false positive .Tampering sim‐ulates experimental results show that when the window size is 11 and the alarm threshold value is 2 , the false positive rate is 1 .183% and the false negative rate is 0 .878% ,so the optimal results were ob‐tained .%针对网页篡改问题,设计了一种基于机器学习的批量网页篡改检测方法.以一所综合性大学所有注册网站为研究对象,通过抓取网站首页面的所有信息,对抓取数据进行分类建立对应的检测规则,综合判断网页是否存在篡改.该方法分为学习阶段和检测阶段,学习阶段根据网页历史信息获取各个检测器的标准值,检测阶段对待检测网页的各个参数进行检测,综合多个检测器的输出,反馈检测结果,若结果为误报,则系统进行重新训练修正参数.以实际发生的网页篡改案例为依据,进行网页篡改模拟,并对误报率和漏报率进行了分析,结果表明:当检测数据集窗口大小为11,报警阈值为2时,误报率为1.183%,漏报率为0.878%,获得了最优的效果.

  7. Créations graphiques réussissez vos brochures, logos, pages web, newsletters, flyers, et bien plus encore !

    CERN Document Server

    McWade, John

    2010-01-01

    Dans cet ouvrage, qui regroupe un florilège des meilleurs projets professionnels publiés dans son célèbre magazine de graphisme, Before & After, John McWade offre une présentation approfondie des principes de base du graphisme, avant de partager concrètement techniques et procédés. Dans un style simple et convivial, il analyse différentes créations graphiques abouties – brochures, lettres d'information, sites web, cartes de visite et autres supports visuels – et explique le pourquoi et le comment de leur réussite. À votre tour, inspirez-vous de ces exemples et employez ces mêmes techniques à l'envie pour améliorer vos propres documents. Vous apprendrez à: rogner des photos pour parfaire leur fonction et leur signification; attirer le regard du lecteur à l'emplacement souhaité à l'aide de huit solutions différentes; travailler sur les composants de base de la création graphique, tels que le trait, la forme, la direction, le mouvement, la mise à l'échelle, la couleur, et bien d'a...

  8. Personal and Public Start Pages in a library setting

    NARCIS (Netherlands)

    Kieft-Wondergem, Dorine

    2009-01-01

    Personal and Public Start Pages are web-based resources. With these kind of tools it is possible to make your own free start page. A Start Page allows you to put all your web resources into one page, including blogs, email, podcasts, RSSfeeds. It is possible to share the content of the page with oth

  9. Personal and Public Start Pages in a library setting

    NARCIS (Netherlands)

    Kieft-Wondergem, Dorine

    Personal and Public Start Pages are web-based resources. With these kind of tools it is possible to make your own free start page. A Start Page allows you to put all your web resources into one page, including blogs, email, podcasts, RSSfeeds. It is possible to share the content of the page with

  10. 基于数据库分类的deep web爬行器研究%Research on Deep Web Crawler Based on Database Classification

    Institute of Scientific and Technical Information of China (English)

    郭少友; 赵善义; 李建平; 王斌

    2011-01-01

    在现有相关研究的基础上,设计一种基于数据库分类的deepweb爬行器。该爬行器首先从抓取的网页中识别出deepweb数据库的入口表单,然后采用查询探测方法对数据库进行自动分类,并根据分类结果来选取一组合适的关键词作为查询词,自动填写入口表单中的文本框并向数据库提出查询请求。实验结果表明,基于数据库分类的deepweb爬行器的爬行效果要优于基于指定查询词的deepweb行器的爬行效果。%On the basis of related work, this paper designs a deep web crawler based on database classification. First, it identifies the entry forms of deep web databases from downloaded pages, and then classifies deep web databases with query probing . According to the classification results, some proper keywords are selected as query terms, and automatically filled in the text boxes of these forms to query deep web databases. The experimental results show that the crawling effect of the crawler based on database classification is superior to that of the crawler based on specified query terms.

  11. 基于分类本体的web集成%Web integration based on classification ontology

    Institute of Scientific and Technical Information of China (English)

    高克宁; 马安香; 张斌

    2006-01-01

    在web信息集成领域,为消除语义异构、实现语义融合,将分类本体引入WWW信息集成,设计了一种基于本体集成的web信息集成系统.通过构建标准分类本体以获取局部分类本体,并建立二者间的映射,以获得多源统一视图.通过计算概念间的统领匹配度、关联匹配度、从属匹配度来完成概念的映射.实现了基于分类本体的web信息集成系统,该系统能很好地解决web信息分类语义异构问题,并能实现多web信息源的集成以及用户个性化定制.%In order to eliminate semantic heterogeneity and implement semantic combination in web information integration, the classification ontology is introduced into web information integration. It constructs a standard classification ontology based on web-glossary by extracting classified structures of websites and building mappings between them in order to get unified views. Mapping is defined by calculating concept subordinate matching degrees, concept associate matching degrees and concept dominate matching degrees. A web information integration system is realized, which can effectively solve the problem of classification semantic heterogeneity and implement the integration of web information source and the personal configuration of users.

  12. 网页制作在英语教学中的应用%Web pages in English Teaching

    Institute of Scientific and Technical Information of China (English)

    刘晓艳

    2015-01-01

    With the continuous development of modern technology, the emergence and development of new teaching media teaching model for traditional application posed a severe challenge. In the process of English teaching, teachers to students through web authoring knowledge transfer, the development of the times to adapt to English teaching requirements. Teaching English through the website one can make knowledge systems, image, methodical teach students, but also to adapt to the new media environment in middle school students the characteristics of cognitive development. This is to stimulate students' interest in learning has a great role in promoting, can improve students learning English enthusiasm and initiative, and to improve the teaching of English teaching level as a whole.%随着现代技术的不断发展,新的教学媒体的出现与发展对传统的应用教学模式提出了严峻地挑战.在英语教学的过程中,教师通过网页对学生进行知识传授,适应了时代发展对英语教学要求.通过网页制作进行英语教学一方面可以将知识系统、形象、有条理的传授给学生,同时也适应了在新媒体环境中学生认知发展的特点.这对于激发学生的学习兴趣有很大的促进作用,可以提高学生进行英语学习的积极性和主动性,进而从整体上提高英语教学教学水平.

  13. ncRNA-class Web Tool: Non-coding RNA feature extraction and pre-miRNA classification web tool

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2012-01-01

    Until recently, it was commonly accepted that most genetic information is transacted by proteins. Recent evidence suggests that the majority of the genomes of mammals and other complex organisms are in fact transcribed into non-coding RNAs (ncRNAs), many of which are alternatively spliced and/or processed into smaller products. Non coding RNA genes analysis requires the calculation of several sequential, thermodynamical and structural features. Many independent tools have already been developed for the efficient calculation of such features but to the best of our knowledge there does not exist any integrative approach for this task. The most significant amount of existing work is related to the miRNA class of non-coding RNAs. MicroRNAs (miRNAs) are small non-coding RNAs that play a significant role in gene regulation and their prediction is a challenging bioinformatics problem. Non-coding RNA feature extraction and pre-miRNA classification Web Tool (ncRNA-class Web Tool) is a publicly available web tool ( http://150.140.142.24:82/Default.aspx ) which provides a user friendly and efficient environment for the effective calculation of a set of 58 sequential, thermodynamical and structural features of non-coding RNAs, plus a tool for the accurate prediction of miRNAs. © 2012 IFIP International Federation for Information Processing.

  14. Monte Carlo methods in PageRank computation: When one iteration is sufficient

    NARCIS (Netherlands)

    Avrachenkov, K.; Litvak, N.; Nemirovsky, D.; Osipova, N.

    2007-01-01

    PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer, and thus it reflects the popularity of a Web page. Google computes the PageRank using the power iteration method, which requires

  15. Monte Carlo methods in PageRank computation: When one iteration is sufficient

    NARCIS (Netherlands)

    Avrachenkov, K.; Litvak, N.; Nemirovsky, D.; Osipova, N.

    2005-01-01

    PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. Google computes the PageRank using the power iteration method which requires ab

  16. Information extraction from massive Web pages based on node property and text content%基于节点属性与正文内容的海量Web信息抽取方法

    Institute of Scientific and Technical Information of China (English)

    王海艳; 曹攀

    2016-01-01

    为解决大数据场景下从海量Web页面中抽取有价值的信息,提出了一种基于节点属性与正文内容的海量Web信息抽取方法。将Web页面转化为DOM树表示,并提出剪枝与融合算法,对DOM树进行简化;定义DOM树节点的密度和视觉属性,根据属性值对Web页面内容进行预处理;引入MapReduce计算框架,实现海量Web信息的并行化抽取。仿真实验结果表明,提出的海量Web信息抽取方法不仅具有更好的性能,还具备较好的系统可扩展性。%To address the problem of extracting valuable information from massive Web pages in big data environ-ments, a novel information extraction method based on node property and text content for massive Web pages was put forward. Web pages were converted into a document object model (DOM) tree, and a pruning and fusion algorithm was introduced to simplify the DOM tree. For each node in the DOM tree, both density property and vision property was defined and Web pages were pretreated based on these property values. A MapReduce framework was employed to realize parallel information extraction from massive Web pages. Simulation and experimental results demonstrate that the proposed extraction method can not only achieve better performance but also have higher scalability compared with other methods.

  17. 基于Web页面链接和标签的聚类方法%Clustering web pages based on their links and tags

    Institute of Scientific and Technical Information of China (English)

    李元俊; 陈俊杰; 赵涓涓

    2009-01-01

    针对目前Web聚类效率和准确率不高的问题,提出一种基于Web页面链接结构和标签信息的聚类方法CWPBLT(clustering web pages based on their links and tags),它是通过分析Web页面中的链接结构和重要标签信息来比较页面之间的相似度,从而对Web站点中的Web页面进行聚类,聚类过程同时兼顾了Web页面结构和页面标签提供的内容信息.实验结果表明,该方法有效地提高了聚类的时间效率和准确性,是对以往仅基于页面主题内容或页面结构聚类方法的改进.

  18. Document object network model for extracting keywords from Web pages%面向Web文本关键词自动抽取的DON模型研究

    Institute of Scientific and Technical Information of China (English)

    彭浩; 蔡美玲; 王瑞龙; 余炳锐

    2012-01-01

    It is very hard to exactly extract keywords from hub Web pages because of its topic noise. A Document Object Network (DON) model and Keywords Extraction Algorithm Based on it (KEYDON) are proposed. The model DON clusters the topic society with the betweenness centrality and impact fraction of nodes in DON. Experiments show that the accuracy of proposed keywords extraction algorithm's performance based on DON has increased by 20% compared with the algorithm based on Doc View model.%Web网页中往往包含许多主题噪声,准确地自动抽取关键词成为技术难点.提出了一个文本对象网络模型DON,给出了对象节点的中心度概念和基于中心度的影响因子传播规则,并据此自动聚集DON中的主题社区(topic society),从而提高了模型的抗噪能力.提出一个基于DON的网页关键词自动抽取算法KEYDON(Keywords Extraction Algorithm Based on DON).实验结果表明,与基于DocView模型的相应算法相比,KEYDON的准确率提高了近20%,这说明DON模型具有较强的抑制主题噪声能力.

  19. 基于视觉特征的网页最优分割算法%Web Page Optimal Segmentation Algorithm Based on Visual Features

    Institute of Scientific and Technical Information of China (English)

    李文昊; 彭红超; 童名文; 石俊杰

    2015-01-01

    网页分割技术是实现网页自适应呈现的关键.针对经典的基于视觉的网页分割算法VIPS(Vision-based Page Segmentation Algorithm)分割过碎和半自动的问题,基于图最优划分思想提出了一种新颖的基于视觉的网页最优分割算法VWOS(Vision-based Web Optimal Segmentation).考虑到视觉特征和网页结构,将网页构造为加权无向连通图,网页分割转化为图的最优划分,基于Kruskal算法并结合网页分割的过程,设计网页分割算法VWOS.实验证明,与VIPS相比,采用VWOS算法分割网页的语义完整性更好,且不需要人工参与.

  20. What snippets say about pages

    NARCIS (Netherlands)

    Demeester, Thomas; Nguyen, Dong; Trieschnigg, Dolf; Develder, Chris; Hiemstra, Djoerd

    2013-01-01

    What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new FederatedWeb Search test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such rese

  1. 57 | Page

    African Journals Online (AJOL)

    Fr. Ikenga

    information on the Web can be accessed have personal jurisdiction over the creator of the information ... obtain access to data held on systems located in foreign jurisdictions, or where the .... and eight sailors and passengers died as a result.

  2. 基于Python的网页版物理实验快速建设技术%Rapid Construction Technology of Web Page Physical Experiment Based on Python

    Institute of Scientific and Technical Information of China (English)

    宫薇薇; 祝继常; 韩煦

    2016-01-01

    How to use scientific calculation software to interact with the data base automatically,apply the web version of the experimental results in a fast way,share the results and application is a core and hot problem gai-ning concerns in the university physics experiment field.It is tried hard to solve the problem via Matlab,but the complier,which costs tens of thousands commercially, makes the scientific computing function and database communication based on Java Web development scheme die. However,Python with multi domain large-scale packages,based on web development of free,open,fast design frame,effective and lightweight solves this prob-lem.This paper introduced the python scientific calculation language,compare the framework between MATLAB and python,how to do the variable assignment,use the scientific computing function library and generate the web version results using the optical physics experimental split tip of the equal thickness interference fringe ex-periment as an example.So this paper can help the Matlab based science researchers to apply the rapid con-struction technology of web page physical experiment in Python.%如何使用科学计算软件自动与数据库进行交互,实现网页版实验结果的快速输出,共享实验结果并应用于教学工作是大学物理实验领域核心和热点的关注问题。一直探索使用Matlab解决上述问题,但是其Complier二十几万的商业使用费用,使得科学计算函数与数据库通讯和基于Java的网页开发方案有始无终。然而,Python多领域大规模的函数包、基于网页开发的免费、公开、快捷的框架设计快速、有效并轻量级地解决了上述问题。本文在介绍Python科学计算语言,比较Matlab、Python进行网页版实验的技术框架的基础上,完成实验数据的数据读取、科学计算函数库的使用、网页版实验结果的生成,并以光学物理实验劈尖的等厚干涉条纹实验为例完成网页

  3. An Abstract Description Approach to the Discovery and Classification of Bioinformatics Web Sources

    Energy Technology Data Exchange (ETDEWEB)

    Rocco, D; Critchlow, T J

    2003-05-01

    The World Wide Web provides an incredible resource to genomics researchers in the form of dynamic data sources--e.g. BLAST sequence homology search interfaces. The growth rate of these sources outpaces the speed at which they can be manually classified, meaning that the available data is not being utilized to its full potential. Existing research has not addressed the problems of automatically locating, classifying, and integrating classes of bioinformatics data sources. This paper presents an overview of a system for finding classes of bioinformatics data sources and integrating them behind a unified interface. We examine an approach to classifying these sources automatically that relies on an abstract description format: the service class description. This format allows a domain expert to describe the important features of an entire class of services without tying that description to any particular Web source. We present the features of this description format in the context of BLAST sources to show how the service class description relates to Web sources that are being described. We then show how a service class description can be used to classify an arbitrary Web source to determine if that source is an instance of the described service. To validate the effectiveness of this approach, we have constructed a prototype that can correctly classify approximately two-thirds of the BLAST sources we tested. We then examine these results, consider the factors that affect correct automatic classification, and discuss future work.

  4. Introduction pages

    OpenAIRE

    2015-01-01

    Introduction Pages and Table of Contents Research ArticlesInsulin Requirements in Relation to Insulin Pump Indications in Type 1 DiabetesPDFGabriela GHIMPEŢEANU, Silvia Ş. IANCU, Gabriela ROMAN, Anca M. ALIONESCU259-263Comparative Antibacterial Efficacy of Vitellaria paradoxa (Shea Butter Tree) Extracts Against Some Clinical Bacterial IsolatesPDFKamoldeen Abiodun AJIJOLAKEWU, Fola Jose AWARUN264-268A Murine Effort Model for Studying the Influence of Trichinella on Muscular Activity of MicePDF...

  5. Rapid De-Duplication of Massive Web Page Based on Counting Bloom Filter%基于 Counting Bloom Filter 的海量网页快速去重研究

    Institute of Scientific and Technical Information of China (English)

    刘年国; 王芬; 吴家奇; 李雪; 陶涛

    2016-01-01

    Web page de-duplication is a process which detects redundant duplicate content pages from a given amount of data collection , and then removes from the collection .The research of web de-duplica-tion based on URL filter has achieved great development .But there is no ideal solution to solve this kind of problem with massive web pages filter .Based on MD5 fingerprint database web de-duplication algo-rithm, and the utilization of Counting Bloom filter algorithm , an algorithm for rapid de-duplication named as IMP-CBFilter has been proposed in this paper .It could improve the efficiency of mass web pages filter by reducing the frequent operation of I/O.The results indicate higher performance by using IMP-CBFilter algorithm .%网页去重是从给定的大量的数据集合中检测出冗余的网页,然后将冗余的网页从该数据集合中去除的过程,其中基于同源网页的URL去重的研究已经取得了很大的发展,但是针对海量网页去重问题,目前还没有很好的解决方案,文章在基于MD5指纹库网页去重算法的基础上,结合Counting Bloom Filter算法的特性,提出了一种快速去重算法IMP-CBFilter。该算法通过减少I/O频繁操作,来提高海量网页去重的效率。实验表明,IMP-CBFilter算法的有效性。

  6. The Importance of Prior Probabilities for Entry Page Search

    NARCIS (Netherlands)

    Kraaij, W.; Westerveld, T.H.W.; Hiemstra, D.

    2002-01-01

    An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length, num

  7. Deep Webpage Classification and Extraction (DWCE)

    OpenAIRE

    Supriya; Meenakshi Sharma

    2013-01-01

    As the Deep web (or Hidden web) information is hidden behind the search query forms, this information can only be accessed by interacting with these forms. Therefore, development of automated system that interacts with the search forms and extracts the hidden web pages would be of great value to human users. To accomplishthis task stated above, this paper proposes a novel method “Deep Webpage Classification and Extraction” which classifies the websites into appropriate domain, extracts their ...

  8. Extract Core Toponyms from Web Page Text Based on Link Analysis%基于链接分析的网页文本核心地名提取方法

    Institute of Scientific and Technical Information of China (English)

    钟翔; 高勇; 邬伦

    2016-01-01

    Geographical information explodes with the emergence of Internet, which also adopts brand new ideas to obtain geospatial data with traditional GIS methods. With the abundant geospatial information on the web, we proposed a toponym co-occurrences network model by extracting the toponym entities from web page texts using nature language process methods, as well as uniforming the toponyms, in order to conduct a comprehensive analysis of the web pages. The network set up in this paper is a weighted directed graph, of which every vertex represents a distinct toponym, and the co-occurrence of each two toponyms is displayed as one edge of this network. The frequency of geographic names is taken into consideration synthetically, which shows the weight of each network edge, as well as explains the co-occurrence relationship and transformation occurrence characteristics of those toponyms. On this basis, a method of toponym extraction from web page texts based on link analysis is carried out, taking advantage of the PageRank algorithm to calculate the link weight of every toponym in the co-occurrence network and rank each geographical name with a PageRank score. In this way, the importance of the toponym is calculated and the core geographic names with remarkable features or navigation features in all huge network resources can be found. A case study based on the actual data extracted from People's Daily and Sina News Sport web pages is carried out to verify the technical solution, which shows that the proposed solution is both feasible and practically effective, which can also be applied to geographical information retrieval. Results show that the core toponym of co-occurrence network differs in different themes of web pages, and when the time sequence factor is taken into account, the core toponym results may also be different within a single theme of web pages.%本文围绕互联网中网页文本蕴含的丰富地理空间信息,抽取网页文本中蕴含的地名实

  9. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN classification method

    Directory of Open Access Journals (Sweden)

    D.A. Adeniyi

    2016-01-01

    Full Text Available The major problem of many on-line web sites is the presentation of many choices to the client at a time; this usually results to strenuous and time consuming task in finding the right product or information on the site. In this work, we present a study of automatic web usage data mining and recommendation system based on current user behavior through his/her click stream data on the newly developed Really Simple Syndication (RSS reader website, in order to provide relevant information to the individual without explicitly asking for it. The K-Nearest-Neighbor (KNN classification method has been trained to be used on-line and in Real-Time to identify clients/visitors click stream data, matching it to a particular user group and recommend a tailored browsing option that meet the need of the specific user at a particular time. To achieve this, web users RSS address file was extracted, cleansed, formatted and grouped into meaningful session and data mart was developed. Our result shows that the K-Nearest Neighbor classifier is transparent, consistent, straightforward, simple to understand, high tendency to possess desirable qualities and easy to implement than most other machine learning techniques specifically when there is little or no prior knowledge about data distribution.

  10. Web-based system for training and dissemination of amagnification chromoendoscopy classification

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    AIM: To evaluate the use of web-based technologies to assess the learning curve and reassess reproducibility of a simplified version of a classification for gastric magnification chromoendoscopy (MC). METHODS: As part of a multicenter trial, a hybrid approach was taken using a CD-ROM, with 20 films of MC lasting 5 s each and an "autorun" file triggering a local HTML frameset referenced to a remote questionnaire through an Internet connection. Three endoscopists were asked to prospectively and independently classify 10 of these films randomly selected with at least 3 d apart. The answers were centrally stored and returned to participants together with adequate feedback with the right answer. RESULTS: For classification in 3 groups, both intra- [Cohen's kappa (κ) = 0.79-1.00 to 0.89-1.00] and inter-observer agreement increased from 1st (moderate) to 6th observation (κ = 0.94). Also, agreement with reference increased in the last observations (0.90, 1.00 and 1.00, for observers A, B and C, respectively). Validity of 100% was obtained by all observers at their 4th observation. When a 4th (sub)group was considered, inter-observer agreement was almost perfect (κ = 0.92) at 6th observation. The relation with reference clearly improved into κ (0.93-1.00) and sensitivity (75%-100%) at their 6th observations.CONCLUSION: This MC classification seems to be easily explainable and learnable as shown by excellent intra- and inter-observer agreement, and improved agreement with reference. A web system such as the one used in this study may be useful for endoscopic or other image based diagnostic procedures with respect to definition, education and dissemination.

  11. 行为特征分析模式下的网页分类技术研究%Research on Web Page Categorization Technology Under Behavior Characteristic Analysis Pattern

    Institute of Scientific and Technical Information of China (English)

    汤亚玲; 崔志明

    2012-01-01

    现有网页分类技术忽略用户个性行为的差异.为此,提出一种结合用户行为特征分析的网页分类技术.运用知识规则发现、页面特征提取等方法,分析Web用户的访问历史和个性化定制信息,学习并掌握用户的行为和兴趣.针对用户的认知特征,提供合适的Web页面分类模式,能在一定程度上改进单纯统计学网页分类方法在自然语言理解上的不足.实验数据表明,该分类方法与多种统计学方法相结合实施网页分类均能有效地提高分类准确率,使网页分类结果更接近分类的真实情形和要求.%This paper introduces a kind of Web page categorization technology through analysis of characters of users' behavior, along with current hotspot of researching on Web pages categorization. Trough grasping users' behavior and interest by analyzing the history of Web user's access, and by concluding knowledge rules out also with pages' characters distilled. It provides a kind of appropriate categorization pattern on Web pages based on users' knowledge level, and surely improves classifying effect without language meanings understood contrast with pure statistic categorization. Experimental results indicate that this pattern of categorization combining kinds of statistic algorithm can improve accuracy of categorization, and make the classifying results more closer to real facts and people's knowledge desire.

  12. Analysis of Public Opinion Dissemination Based on Distinguishing the Near-duplicate Web Pages%基于网页转载关系判别的网络舆情传播态势分析

    Institute of Scientific and Technical Information of China (English)

    王君泽; 曾润喜; 杜洪涛

    2015-01-01

    网络舆情分析工作中的一项关键任务是识别新闻之间的多次转载关系,进而确定目标事件舆情在互联网上的分布状况和传播范围。基于网页主题内容抽取、新闻转载关系初筛,以及基于核函数的相似度计算等环节,判别新闻报道之间的转载关系,并对转载识别效果进行检验,构建了基于网页转载关系判别的网络舆情传播态势分析模型,有助于对敏感事件的网络舆情进行针对性的引导和控制。%Identifying the near-duplicate web pages helps to confirm the spread scope of public opinion on the Internet. In this paper we studied how to identify the reprinted relations between news content. Firstly, we extract the news content from the web pages;secondly, we find the candidates of reprinted web pages;then we use the kernel function to measure the similarity between the news content, and find out the reproduced relations. The experiment result shows that the model proposed in this paper can effectively identify the reproduced rela-tions between web pages. The model is helpful for prevention and control of the negative Internet public opinion.

  13. An Approach to Purify Web Pages Based on the Local Optimal DOM Tree%基于局部最优标签树的网页净化方法

    Institute of Scientific and Technical Information of China (English)

    胡飞; 杨华千; 韦鹏程; 彭涛; 蒲昌玖

    2012-01-01

    A news web page has a lot of paragraph tags, most of which exist in topic zones, and a little in noise zones. According to this feature, a novel purification approach is proposd based on the local optimal DOM tree algorithm. Through searching sibling nodes for the one with the most number of paragraph tags, eliminating the other nodes, a purified DOM tree is got. That is the tree for the purified Web page. This approach is simple and significant, especially to the topic text Web pages.%新闻网页里面包含大量文字分段标签,相比网页其它区域的噪音内容,其主题内容区域的文字分段标签较多.根据这一特点引入局部最优标签树搜索算法.通过搜寻同级节点中分段标签最多的容器节点,消除其它容器节点,从而实现网页净化方法.实验证明方法实现简单、净化效果明显,特别是对新闻类主题文字网页净化效果显著.

  14. The Advantages and Disadvantages of DIV + CSS Distribution Mode Applied in Making Static Web Pages%浅谈静态网页制作中DIV+CSS在网页布局中的优劣势

    Institute of Scientific and Technical Information of China (English)

    刘效春

    2011-01-01

    在网页设计中,网页布局非常重要,直接影响到页面的显示效果。目前,采用CSS+DIV的方式进行网页布局,是网页布局的主要技术之一,它具有结构与表现分离、页面载入快、易被搜索、字体控制与排版能力强,以及对浏览器具有亲和力等优点,同时也存在程序编写难度高、调用复杂等缺点。%The distribution mode of content is very important in the design of web pages, which will directly influence the presentation effect of the web pages. Nowadays, DIV + CSS distribution mode is one of the major technologies in the field. The distribution mode shows advantages such as separation between structure and presentation, fast presentation of web pages, easy access, strong control on typeface such as difficulties in composing and typesetting, and easy service for browser. In the meanwhile, it also shows disadvantages program, complexity in adjustment and application.

  15. Automatic classification of Deep Web sources based on KNN algorithm%基于K-近邻算法的Deep Web数据源的自动分类

    Institute of Scientific and Technical Information of China (English)

    张智; 顾韵华

    2011-01-01

    To meet the need of Deep Web query, an algorithm for classification of Deep Web sources based on KNN is put forward. The algorithm extracts the form features from Web pages, and makes the form features vector normal. Then the algorithm classifies Deep Web pages by computing distance. The experimental results show that the algorithm has improved in precision and recall.%针对Deep Web的查询需求,提出了一种基于K-近邻算法的Deep Web数据源的自动分类方法.该算法在对Deep Web网页进行表单特征提取及规范化的基础上,基于距离对Deep Web网页所属的目标主题进行判定.实验结果表明:基于K-近邻分类算法可以较有效地进行DeepWeb数据源的自动分类,并得到较高的查全率和查准率.

  16. Research and Design of User Personalization Dictionary Based on Interesting Web Pages%基于改进用户浏览行为量化分析的兴趣网页获取

    Institute of Scientific and Technical Information of China (English)

    李力沛

    2012-01-01

      In the major of web peronalization services modules and systems, the user's interesting module is often established by mining the browsing history web pages of user. So It's important to find the user's interesting web pages for web personalization services modules and systems from lots of browsing histories. In this paper, determining the interesting web pages through the quantitative analysis of user's browsing actions for providing the accuracy object for the user interests modeling then. on the basis of original quantitative analysis, the values of browsing actions has been normalized and the parameter which need to be con⁃firmed has been reduced, as a result, the efficiency of algorithm has been raised and more feasible.%  在大量的Web个性化服务模型或系统中,用户兴趣模型均是通过挖掘用户浏览历史网页获得的。因此从大量的浏览历史里获取用户兴趣网页对于Web个性化服务模型或系统十分重要。该文通过对用户浏览行为进行量化分析来判断兴趣网页,目的是为后续的用户兴趣建模提供准确的挖掘对象。在原有量化分析方法的基础上,该文对浏览行为的贡献值进行归一化,减少需要确定的参数,在一定程度上提高了算法的运行效率,使算法具有更好的可行性。

  17. Deep Webpage Classification and Extraction (DWCE

    Directory of Open Access Journals (Sweden)

    Supriya

    2013-04-01

    Full Text Available As the Deep web (or Hidden web information is hidden behind the search query forms, this information can only be accessed by interacting with these forms. Therefore, development of automated system that interacts with the search forms and extracts the hidden web pages would be of great value to human users. To accomplishthis task stated above, this paper proposes a novel method “Deep Webpage Classification and Extraction” which classifies the websites into appropriate domain, extracts their query interfaces and retrieves all result pages of deep websites using query building system.

  18. A new means of communication with the populations: the Extremadura Regional Government Radiological Monitoring alert WEB Page; Un nuevo intento de comunicacion a la poblacion: La pagina Web de la red de alerta de la Junta de Extremadura

    Energy Technology Data Exchange (ETDEWEB)

    Baeza, A.; Vasco, J.; Miralles, Y.; Torrado, L.; Gil, J. M.

    2003-07-01

    Extremadura XXI a summary sheet, relatively easy to interpret, giving the radiation levels and dosimetry detected during the immediately proceeding semester. Recently too, the challenge has been taken on of providing constantly, updated information on as complex a topic as the radiological monitoring of the environment. To this end, a Web page has been developed dealing with the operation and results provided by the aforementioned Radiological Warning Betwork of Extremadura. The page structure consists of seven major blocks: (i) origin and objectives of the network; (ii) a description of the stations of the network; (iii) their modes of operation in normal circumstances and in the case of an operational or radiological anomaly; (iv) the results that the network provides; (v) a glossary of terms to clarify as straightforwardly as possible some of the terms and concepts that are of unavoidable use, but are unfamiliar to the population in general; (vi) information about links to other Web sites that also deal with this issue to some degree; and (vii) giving the option of questions and contacts between the visitor to the page and those responsible for its creation and maintenance. Actions such as that described here will doubtless contribute positively to increasing the necessary trust that the population deserves to have in the correct operation of the measures adopted to guarantee their adequate radiological protection. (Author)

  19. Analysis on the Technique of Table and Layer ’s Type Setting in Web Page Design%网页设计中使用表格和层排版的技巧分析

    Institute of Scientific and Technical Information of China (English)

    龙敏敏

    2015-01-01

    The paper holds that table is the important and indispensable element of web design.Page type setting can standardize the whole web page and organize written language,pictures and animations of web page systematically.layer is a new CSS locating technology.It possesses a lot of characteristics which does not exist in table.For example,layer can overlay,move conveniently and set hiding.Moreover,it can add many behavior to enrich the effects of the web page.As a result,under many circumstances,we can use simultaneously table and layer to set type cooperatively.In this way,it can not only show the integral standard of talbe,but show the lfexibility and plentiful function of layer.Thereby we can design more wonderful web pages.%文章指出,表格是网页设计中不可缺少的重要元素。使用表格排版可以规范整个页面,将页面中文字、图片、动画等众多元素有条理地统一组织起来。层是一种新的CSS定位技术,它具备许多表格所不具备的特点,比如可以重叠,移动方便,可以设为隐藏,还可以添加许多行为来丰富页面效果。所以,很多情况下,可以同时使用表格与层搭配进行排版,既可以使用到表格的整体规范性,又可以利用到层的灵活性与丰富功能,从而设计出更加精彩的页面。

  20. DIV+CSS技术在网页制作与设计中的应用研究%Research on the Application of DIV+CSS Technology in Web Page Making and Design

    Institute of Scientific and Technical Information of China (English)

    武海丽; 李彩玲

    2016-01-01

    With the advent of the Internet era, the web page layout has become one of the key points of web design, the DIV+CSS technology has become the dominant technology in many web page layout method the web and its position in the design is also more and more high, this paper gives a basic overview of the DIVCSS technology, combined with the application of education resources in the teaching website platform, introduces the design of DIVCSS technology in"pattern.%随着网络时代的来临,网页布局已经成为网页设计的关键点之一,DIV+CSS技术则成为众多网页布局方法中的主导技术,它在网页制作与设计中的地位也越来越高。文章对DIV+CSS技术进行了基本概述,并结合在教育资源网站平台的应用,介绍了DIV+CSS技术在网页格局上的设计。

  1. A Deep Web Query Interfaces Classification Method Based on RBF Neural Network

    Institute of Scientific and Technical Information of China (English)

    YUAN Fang; ZHAO Yao; ZHOU Xu

    2007-01-01

    This paper proposes a new approach for classification for query interfaces of Deep Web, which extracts features from the form's text data on the query interfaces, assisted with the synonym library, and uses radial basic function neural network (RBFNN) algorithm to classify the query interfaces. The applied RBFNN is a kind of effective feed-forward artificial neural network, which has a simple networking structure but features with strength of excellent nonlinear approximation, fast convergence and global convergence. A TEL_8 query interfaces' data set from UIUC on-line database is used in our experiments, which consists of 477 query interfaces in 8 typical domains. Experimental results proved that the proposed approach can efficiently classify the query interfaces with an accuracy of 95.67%.

  2. 基于布局相似性的网页正文内容提取研究%Study of Web pages content extraction based on layout similarity

    Institute of Scientific and Technical Information of China (English)

    杨柳青; 李晓东; 耿光刚

    2015-01-01

    Appropriate Web content extraction technique can remove the data which is redundant,repetitive and useless from massive Web pages while extracting more meaningful and more useful data.Through the observation of Web pages,this paper proposed and implemented a Web content extraction method based on the layout similarity that the pages under the same Web site showed similar in content layout and style structure.It achieves the purpose of main content extraction by comparing the similarity of the DOMnode structure data from the Web pages belong to the same topic of the same sites.It also did some tenta-tive research and implementation on some other content relevent to this content extraction method.Experiments prove that this method is simple,pratical and universal,and it can not only meet the requirement of both high accuracy but also provide sup-port for more Internet applications of content analysis.%合理的网页正文提取技术可以将海量互联网数据中冗余的、重复的、无用的信息去除,获取更加有实际意义和价值的数据。经过对网页的观察,发现同一网站下的网页具有在内容布局和样式结构上非常相似的特点,提出并实现了一种基于布局相似性的网页正文提取方法,即通过比对来自同一网站同一专题的网页DOM树中节点数据信息的相似性来实现正文提取,并对相关问题进行了尝试性的研究和实现。实验证明该方法思路简单、实用性强、普适性好,在满足较高准确率的同时,能为众多互联网内容分析应用提供支撑。

  3. Grass-Roots Cataloging and Classification: Food for Thought from World Wide Web Subject-Oriented Hierarchical Lists.

    Science.gov (United States)

    Dodd, David G.

    1996-01-01

    Examines the structure and principles of various hierarchical guides, or hotlists, which attempt to give subject access to World Wide Web resources on the Internet. The lists are compared to classification schemes and to Library of Congress subject headings, and browsing and search engines are compared. (Author/LRW)

  4. Combining vision information and tag information to extract Deep Web result pages content%视觉与标签信息的Deep Web查询页面内容提取

    Institute of Scientific and Technical Information of China (English)

    冯永; 唐黎

    2012-01-01

    Extracting content from deep web pages is a challenging problem due to the underlying intricate structures of such pages.A vision and tags based approach(DVS) is proposed.It primarily utilizes the vision information and tag information on the Deep Web result pages to extract the content structure of pages.This approach includes two steps as follows: First,the vision information and tag information are produced by analyzing the Cascading Style Sheet and the DOM Tree to generate an initial visual-tree of the Deep Web result page.And then,the Path Shingle(PS) algorithm is employed,by considering both of the vision and the tag information,and the blocks in the visual-tree are clustered according to the similarity computing result of them to produce the final visual-tree,i.e.,the content structure of pages.The innovations of DVS are that it utilizes the vision information and tag information on the Deep Web pages to extract the content structure;and stores the vision information as a tree to tansform the analysis of the vision information to a vision-attribute tree.Experiments are conducted with a large set of Web databases called UIUC’s TEL.The experimental results show that the vision and tag based approach has high precision compared with the WTS algorithm and the VIPS algorithm.%提出了一种结合页面视觉信息和标签信息来提取页面内容结构的方法——DVS。DVS首先通过分析页面的CSS样式信息、DOM树以获得页面的视觉信息和标签信息,初步得到页面的视觉树;然后利用树的路径相似算法,既考虑标签信息又考虑视觉信息来计算树中模块的相似性,对模块进行聚类,最终得到页面的视觉树,即页面的内容结构。DVS主要的特色在于从视觉信息和标签信息两方面来提取页面的内容结构;采用树形结构表示视觉信息,将分析视觉信息转换成分析"视觉属性"树。实验采用UIUC的TEL数据集,分别与WTS算法、VIPS算法进行

  5. Application of PAGE Model in the Precise Control of Web Front End Printing%PAGE模型在web前端打印的精确控制应用研究

    Institute of Scientific and Technical Information of China (English)

    苏亚涛

    2016-01-01

    目前,大部分应用软件都是基于web的应用程序,在实际工作过程中,将web应用中的文档输出至打印机是非常普遍的。当前普遍的Web应用程序往往更加注重对屏幕的输出而不是打印机。因此在web应用程序中对文档进行精确的打印输出是非常实用和必要的。针对这一问题运用PAGE模型对web前端打印控制的方法进行研究与实现。%At present, most of the application software are based on the application of web, in the actual work process, the web application in the document output to the printer is very common. Current popular Web applications tend to pay more attention to the output of the screen instead of the printer. So it is very useful and necessary to make the exact print out of the document in the web application. In this paper, the PAGE model is used to research and implement the web front end printing control method.

  6. Semantic Advertising for Web 3.0

    Science.gov (United States)

    Thomas, Edward; Pan, Jeff Z.; Taylor, Stuart; Ren, Yuan; Jekjantuk, Nophadol; Zhao, Yuting

    Advertising on the World Wide Web is based around automatically matching web pages with appropriate advertisements, in the form of banner ads, interactive adverts, or text links. Traditionally this has been done by manual classification of pages, or more recently using information retrieval techniques to find the most important keywords from the page, and match these to keywords being used by adverts. In this paper, we propose a new model for online advertising, based around lightweight embedded semantics. This will improve the relevancy of adverts on the World Wide Web and help to kick-start the use of RDFa as a mechanism for adding lightweight semantic attributes to the Web. Furthermore, we propose a system architecture for the proposed new model, based on our scalable ontology reasoning infrastructure TrOWL.

  7. Introduction pages

    Directory of Open Access Journals (Sweden)

    Radu E. Sestras

    2015-09-01

    Full Text Available Introduction Pages and Table of Contents Research ArticlesInsulin Requirements in Relation to Insulin Pump Indications in Type 1 DiabetesPDFGabriela GHIMPEŢEANU,\tSilvia Ş. IANCU,\tGabriela ROMAN,\tAnca M. ALIONESCU259-263Comparative Antibacterial Efficacy of Vitellaria paradoxa (Shea Butter Tree Extracts Against Some Clinical Bacterial IsolatesPDFKamoldeen Abiodun AJIJOLAKEWU,\tFola Jose AWARUN264-268A Murine Effort Model for Studying the Influence of Trichinella on Muscular Activity of MicePDFIonut MARIAN,\tCălin Mircea GHERMAN,\tAndrei Daniel MIHALCA269-271Prevalence and Antibiogram of Generic Extended-Spectrum β-Lactam-Resistant Enterobacteria in Healthy PigsPDFIfeoma Chinyere UGWU,\tMadubuike Umunna ANYANWU,\tChidozie Clifford UGWU,\tOgbonna Wilfred UGWUANYI272-280Index of Relative Importance of the Dietary Proportions of Sloth Bear (Melursus ursinus in Semi-Arid RegionPDFTana P. MEWADA281-288Bioaccumulation Potentials of Momordica charantia L. Medicinal Plant Grown in Lead Polluted Soil under Organic Fertilizer AmendmentPDFOjo Michael OSENI,\tOmotola Esther DADA,\tAdekunle Ajayi ADELUSI289-294Induced Chitinase and Chitosanase Activities in Turmeric Plants by Application of β-D-Glucan NanoparticlesPDFSathiyanarayanan ANUSUYA,\tMuthukrishnan SATHIYABAMA295-298Present or Absent? About a Threatened Fern, Asplenium adulterinum Milde, in South-Eastern Carpathians (RomaniaPDFAttila BARTÓK,\tIrina IRIMIA299-307Comparative Root and Stem Anatomy of Four Rare Onobrychis Mill. (Fabaceae Taxa Endemic in TurkeyPDFMehmet TEKİN,\tGülden YILMAZ308-312Propagation of Threatened Nepenthes khasiana: Methods and PrecautionsPDFJibankumar S. KHURAIJAM,\tRup K. ROY313-315Alleviate Seed Ageing Effects in Silybum marianum by Application of Hormone Seed PrimingPDFSeyed Ata SIADAT,\tSeyed Amir MOOSAVI,\tMehran SHARAFIZADEH316-321The Effect of Halopriming and Salicylic Acid on the Germination of Fenugreek (Trigonella foenum-graecum under Different Cadmium

  8. PageRank of integers

    CERN Document Server

    Frahm, K M; Shepelyansky, D L

    2012-01-01

    We build up a directed network tracing links from a given integer to its divisors and analyze the properties of the Google matrix of this network. The PageRank vector of this matrix is computed numerically and it is shown that its probability is inversely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web. The spectrum of the Google matrix of integers is characterized by a large gap and a relatively small number of nonzero eigenvalues. A simple semi-analytical expression for the PageRank of integers is derived that allows to find this vector for matrices of billion size. This network provides a new PageRank order of integers.

  9. PBL Teaching Mode Application in Teaching of Vocational WebPage Design and Production%PBL学习模式在高职《网页设计与制作》教学中的应用

    Institute of Scientific and Technical Information of China (English)

    刘燕容

    2012-01-01

    基于问题的学习(Problem-Based Learning,PBL)模式强调在解决问题过程中获得知识和技能。本文结合高职《网页设计与制作》教学案例,对PBL学习模式下《动网页设计与制作》的课程教学进行了尝试和探索。%Problem-Based Learning (Problem-Based Learning PBL) model emphasizes knowledge and skills in problem-solving process.In this paper,Vocational web design and production of teaching cases,PBL mode of learning Dynamic WebPage Design and Production teaching to try and explore.

  10. Classification of web resident sensor resources using latent semantic indexing and ontologies

    CSIR Research Space (South Africa)

    Majavu, W

    2008-01-01

    Full Text Available Web resident sensor resource discovery plays a crucial role in the realisation of the Sensor Web. The vision of the Sensor Web is to create a web of sensors that can be manipulated and discovered in real time. A current research challenge...

  11. The application of packet classification and queue management in the web QOS system

    Science.gov (United States)

    Bai, Chun; Liu, Jin

    2011-10-01

    The control mechanisms and strategies of QoS are introduced and implemented in the Web server and system, to meet the growing demand of Web performance. And it provides service differentiation and performance guarantees for different types of user or request. And it is now a problems that urgent need to solve for Web development. The paper proposed a program that designs Web QoS software of the middleware to improve the server quality of the LAN Web server.

  12. A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

    Science.gov (United States)

    García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

    2016-10-01

    This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.

  13. Best Practices for Searchable Collection Pages

    Science.gov (United States)

    Searchable Collection pages are stand-alone documents that do not have any web area navigation. They should not recreate existing content on other sites and should be tagged with quality metadata and taxonomy terms.

  14. Keyword extraction from Chinese news Web pages based on multi-features%基于组合特征的中文新闻网页关键词提取方法

    Institute of Scientific and Technical Information of China (English)

    袁津生; 毛新武

    2014-01-01

    Considering the characteristics of Chinese news Web pages, this paper uses many features including statistical feature, position feature and POS(Part of Speech)feature to evaluate the weight of candidate keywords. In order to solve the problem of that some segmentation cannot reflect the theme, this paper proposes a compound words generation method based on directed graph, which aims to find adjacency words for compound words. The experimental results show that this method is vastly superior to the conventional TF-IDF method in efficiency and can extract keyword from news Web page efficiently.%针对中文新闻网页的特点,使用了包括统计特征、位置特征和词性特征等在内的多种特征综合评定候选关键词的权重大小。对于部分分词结果不能良好地反映主题的问题,提出了一种基于有向图的组合词生成方法,旨在找出高频次的相邻词作为组合词。实验结果表明,该方法较传统的TF-IDF方法效率有较大提升,能够有效提取出新闻网页关键词。

  15. Classification model for Deed Web sources%Deep Web数据源分类模型研究

    Institute of Scientific and Technical Information of China (English)

    姚双良; 鞠时光

    2012-01-01

    研究了Deep Web数据源自动分类,分析了其研究的内容和面临的问题,提出了Deep Web数据源分类模型,描述了表单特征的提取、预处理和基于向量空间的相似度计算,最后运用优化的KNN分类器对待分类的Deep Web数据源进行领域分类,实验表明该模型具有良好的分类效果,具有一定的实用价值.%This paper studied Deep Web-automatic classification. It analyses its research content and the problems faced. It puts forward Deep Web data classification model. And it describes the form feature extraction, pretreatment and the similarity calculation based on vector space. Finally, the classifier based on optimized KNN algorithm dividends the Deep Web data sources. The experiments show that the model has good classification effect, and has the practical value of application.

  16. Instant PageSpeed optimization

    CERN Document Server

    Jaiswal, Sanjeev

    2013-01-01

    Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. Instant PageSpeed Optimization is a hands-on guide that provides a number of clear, step-by-step exercises for optimizing your websites for better performance and improving their efficiency.Instant PageSpeed Optimization is aimed at website developers and administrators who wish to make their websites load faster without any errors and consume less bandwidth. It's assumed that you will have some experience in basic web technologies like HTML, CSS3, JavaScript, and the basics of netw

  17. 基于网页内容相似度和链接关系的社区发现及动态添加%An Algorithm for Community Identification and Dynamical Addition Based on Web Pages Contents Similarity and Link Relation

    Institute of Scientific and Technical Information of China (English)

    云颖; 袁方; 刘宇; 王传豹

    2011-01-01

    An algorithm for community identification based on the Web pages contents similarity and the link relation between the Web pages was proposed. The algorithm not only considered the hyperlinks between Web pages but focused on the content similarity of Web pages. This method overcame the limitations of ignoring the content of Web pages in traditional community discovery algorithms, so that the communities founded in the content were more relevant. In addition, the paper added the new members based on the original community dynamically, and added the new Web pages which linked to the Web pages of original community related to the theme into the original community. Experiments showed that the method was applied to community discovery in the network, and the community was more relevant in the content.%给出了一种基于网页内容相似度和网页之间链接关系的社区发现方法.该方法不仅考虑了网页之间的超链接关系,而且着重考虑了网页在内容上的相似度并克服了传统社区发现算法忽略网页内容的局限性,使发现的社区在内容上更相关.在原始社区的基础上对其进行动态添加,将网络中新出现的与原始社区中的网页存在链接关系同时与主题相关的网页加入到原始社区.实验表明,此方法可以有效地应用于网络的社区发现,使发现的社区在内容上更相关.

  18. Sleep Apnea Information Page

    Science.gov (United States)

    ... Page You are here Home » Disorders » All Disorders Sleep Apnea Information Page Sleep Apnea Information Page Search Disorders Search NINDS SEARCH ... Institutes of Health (NIH) conduct research related to sleep apnea in laboratories at the NIH, and also ...

  19. A comparative study and classification on web service security testing approaches

    Directory of Open Access Journals (Sweden)

    Azadeh Esfandyari

    Full Text Available Web Services testing is essential to achieve the goal of scalable, robust and successful Web Services especially in business environment where maybe exist hundreds of Web Services working together. This Relatively new way of software development brings ou ...

  20. MACHINE LEARNING IMPLEMENTATION FOR THE CLASSIFICATION OF ATTACKS ON WEB SYSTEMS. PART 1

    Directory of Open Access Journals (Sweden)

    K. Smirnova

    2017-08-01

    Full Text Available The possibility of applying machine learning is considered for the classification of malicious requests to a Web application. This approach excludes the use of deterministic analysis systems (for example, expert systems, and based on the application of a cascade of neural networks or perceptrons on an approximate model to the real human brain. The main idea of the work is to enable to describe complex attack vectors consisting of feature sets, abstract terms for compiling a training sample, controlling the quality of recognition and classifying each of the layers (networks participating in the work, with the ability to adjust not the entire network, But only a small part of it, in the training of which a mistake or inaccuracy crept in.  The design of the developed network can be described as a cascaded, scalable neural network.  The developed system of intrusion detection uses a three-layer neural network. Layers can be built independently of each other by cascades. In the first layer, for each class of attack recognition, there is a corresponding network and correctness is checked on this network. To learn this layer, we have chosen classes of things that can be classified uniquely as yes or no, that is, they are linearly separable. Thus, a layer is obtained not just of neurons, but of their microsets, which can best determine whether is there some data class in the query or not. The following layers are not trained to recognize the attacks themselves, they are trained that a set of attacks creates certain threats. This allows you to more accurately recognize the attacker's attempts to bypass the defense system, as well as classify the target of the attack, and not just its fact. Simple layering allows you to minimize the percentage of false positives.

  1. 面向导航型网页关键词自动抽取的视觉模型与算法%Visual representation model and automatic keywords extraction algorithm for hub Web pages

    Institute of Scientific and Technical Information of China (English)

    彭浩; 蔡美玲; 陈继锋; 刘炽; 余炳锐

    2012-01-01

    导航型网页中往往包含了大量的噪声信息,为自动提取网页中的关键词带来了较大的困难.为此,提出一个新的网页表示模型PIX-PAGE和导航型网页关键词自动抽取算法P-KEA.PIX-PAGE模型利用提出的区域合并算法,将一张网页分割为适当粒度的区域;然后,依据人类视觉特点,对各区域进行视觉“奇异性”量化,同时利用奇异性传递规则进一步强化关键词相关区域的视觉“奇异性”.P-KEA根据PIX-PAGE模型模型的视觉量化结果,能够较准确地找到视觉突出区域中的关键词.实验结果表明,与基于DocView模型的算法DVM相比,P-KEA的准确率平均提高了20.9%.%It is very hard to exactly extract keywords from hub Web pages because of its topic noise. To resolve this problem, a new sub Web page representation model and its automatic keywords extraction algorithm were proposed in this paper. At first, the new model segmented Web page into some blocks by using the block composition algorithm. Secondly, according to the visual recognition method of humanity, the new model computed the visual measurement of these blocks. At the same time, the transmission rule of visual measurement made blocks special where keywords were contained more specially. The automatic keywords extraction algorithm could exactly fmd these keywords in the most special hlocks. The experimental results show that the proposed algorithm has bumped up by 20. 9% on average in accuracy compared with keywords extraction algorithm based on DocView model.

  2. Web页面视觉搜索与浏览策略的眼动研究%Exploring Visual Search and Browsing Strategies on Web Pages Using the Eye-tracking

    Institute of Scientific and Technical Information of China (English)

    栗觅; 钟宁; 吕胜富

    2011-01-01

    利用眼动跟踪技术,探讨Web页面视觉搜索和浏览的视觉特征,并分析2种视觉行为对应策略的差异.实验结果发现,视觉搜索时,周边区域的注视时间和注视次数显著大于中心区域;而视觉浏览时,周边区域和中心区域没有显著差异.而且,视觉搜索时的瞳孔直径显著大于浏览时的瞳孔直径,说明视觉搜索时的心理负荷显著大于浏览时的心理负荷.结果表明,在Web页面上视觉搜索呈现周边区域的视觉搜索策略,而对于视觉浏览更多采用无明显规律的自由随机浏览策略.这种视觉搜索与浏览策略的差异主要是来自目标驱动和心理负荷大小的影响.%This study investigates the characteristics of visual search and browsing, and analyzes the differences of those strategies on Web pages using the eye tracking. When participants search on Web pages, the peripheral area is significantly higher than the central area on fixation duration and fixation count; when participants brows information on Web pages, there is no significant difference between the peripheral area and the central area on fixation duration and fixation count. Visual search is significantly larger than browsing on the average pupil diameter, which shows that the mental load on visual search is significantly greater than that on browsing.Results show that the visual search strategy pays more attention to the peripheral area than the central area;however, the visual browsing strategy freely run their eyes over the peripheral area and the central area in equality. The differences between the visual search and the browsing strategies are mainly due to the goal-driven and the mental load.

  3. AN STUDY OF SIMILARITY MEASUREMENT BETWEEN PHISHING AND LEGITIMATE WEBSITES USING BAYESIAN CLASSIFICATION AND ITS PERFORMANCE EVALUATION

    OpenAIRE

    Dr. Rajendra Gupta

    2017-01-01

    Safe web browsing and feeding confidential information into websites require the use of protected and secured websites. For the web security, a number of anti-phishing tools have been proposed which provide web user with a dynamic system of warning and protection against potential phishing attacks. Earlier study shows that there is no anti-phishing tool gives satisfactory result in identifying the phishing web pages. For the solution of this problem, in this paper a Bayesian classification ap...

  4. Measuring the Utilization of On-Page Search Engine Optimization in Selected Domain

    National Research Council Canada - National Science Library

    Goran Matošević

    2015-01-01

    Search engine optimization (SEO) techniques involve „on-page“ and „off-page“ actions taken by web developers and SEO specialists with aim to increase the ranking of web pages in search engine results pages (SERP...

  5. Métrica de factores on-page en el posicionamiento de páginas web en los motores de búsqueda orgánicos

    OpenAIRE

    Bécares Pérez, Manuel

    2013-01-01

    Actualmente Internet se ha convertido en el medio más utilizado por personas, instituciones y empresas para darse a conocer. Además, generalmente, las personas acuden a la red de redes para buscar prácticamente cualquier tipo de información. Debido al crecimiento de la web, cada vez toman mayor importancia los motores de búsqueda en la red, los cuales gracias a sus optimizados algoritmos son capaces de clasificar la información y ordenarla por relevancia. Generalmente, cuando se quiere ...

  6. 浅谈网页布局技术的教学实践%Analysis on the teaching practice of Web page layout technology

    Institute of Scientific and Technical Information of China (English)

    李敏

    2016-01-01

    随着万维网联盟(World Wide Web Consortium,W3C)对互联网规范的致力推广,Div+CSS布局技术已成为网页设计和开发过程中重要的技术组成部分.在教学过程中,需要解决如何让学生对网页布局难点有深入理解并能学以致用的问题.文章对Div+CSS布局技术的教学方法和内容进行探讨.

  7. Security Design for Web Pages Constructed by PowerBuilder%基于PowerBuilder的Web页面安全性设计

    Institute of Scientific and Technical Information of China (English)

    张少敏; 王保义

    2001-01-01

    PowerBuilder是开发客户机/服务器模式管理信息系统功能的有力工具。讨论了用PowerBuilder开发Web应用的2种方式(Plug—ins方式和Web.PB方式)及其特点和执行过程,对2种Web应用方式构建的Web页面的安全性进行了设计。

  8. 基于网络环境下Web页面输出技术的研究与设计%Research and Design on Frame Page Printout Technology Based on Network Environment Web

    Institute of Scientific and Technical Information of China (English)

    康苏明; 傅文博

    2011-01-01

    In the Web page printout of information system design is different from the C / S structure,easy to control,the existing control method controls the exact position can not be achieved.This paper proposes a solution based on XML,XSLT and other means,to ach%提出基于XSLT、XM等技术手段的解决方案,能够将文本的实现内容,同文本的表现形式分离,形成在Web页面下的打印输出格式,精确定位元素。通过一个刑侦自动化信息系统例子进行详细说明。

  9. Web-page Resources Targeted Harvesting System of Logistics Information Platform Based on Nutch%基于Nutch的物流信息平台网页资源定向采集系统

    Institute of Scientific and Technical Information of China (English)

    刘兴邦; 赵晓娇

    2012-01-01

    In this paper, in view of the inadequacy of the information resources harvesting system in logistics information platforms, we proposed to build a web-page resources targeted harvesting system based on Nutch, discussed emphatically the issues of segmentation of Chinese characters, topic relevance analysis, query result ranking and text parsing, etc., and finally carried out the corresponding experiment under certain conditions and analyzed the result.%针对物流信息平台中信息资源采集系统建设的不足,提出建立基于Nutch的网页资源定向采集系统,并对中文分词、主题相关度分析、结果排序、正文解析等关键模块进行重点探讨.最后在一定条件下进行了实验,并分析了实验结果.

  10. Research on Identification Method of Data Table in Web Page%Web页面中数据表的识别方法研究

    Institute of Scientific and Technical Information of China (English)

    车成逸; 马宗民; 焦晓龙

    2012-01-01

    为提高Web数据表识别的准确性,提出一种基于支持向量机与混合核函数的数据表识别方法.给出表格的结构特征、内容特征以及行(列)相似特征,将多项式核函数和线性核函数组成混合核函数,利用其进行Web数据表的自动识别.实验结果表明,该方法在7个站点上,准确率和召回率的平均值为95.14%和95.69%.%In order to improve the identification accuracy of Web data table, this paper proposes an identification method based on Support Vector Machine(SVM) and mixed kernel function. This paper gives the structural features, content features and row(column) similarity features of the table, and takes mixed kernel function constructed by a polynomial kernel function and a linear kernel function, automatically recognizes the Web meaningful tables. Experimental result shows that the average precision rate and recall rate of this method are 95.14% and 95.69% in seven sites.

  11. Development of Visual Design Process of Clothing Sales Web Page Based on Perceptual Engineering%基于感性工学的服装销售网页视觉设计流程开发

    Institute of Scientific and Technical Information of China (English)

    侯倩

    2015-01-01

    Network sales have increasingly become an important way of enterprise image display and sales. It becomes a design issue of concern that enhancing the visual appeal of the web page rapidly and accurately. So, by collecting and analyzing existing research results, the article formed the development ideas of the visual design process of clothing sales web page based on the perceptual engineering. From the demands of consumers and sellers, the article set up the principle of the process development, established the constructing points, and ifnally completed the process building, to offer methods and theoretical references for the relevant design practice and further research.%网络销售越来越成为企业展示形象和销售的重要途径,快速准确地增强网页视觉吸引力成为备受关注的设计问题。因此本文整理与分析了已有研究成果,形成了基于感性工学的服装销售网页视觉设计流程的构建思路,并从消费者与销售者双方诉求入手,树立了流程开发的原则,明确了构建要点,最终完成了流程的构建,以期为相关设计实践与深入研究提供方法借鉴和理论支撑。

  12. 个性化搜索引擎中网页特征描述的研究%Research on characteristic description of Web pages in personalized searching engine

    Institute of Scientific and Technical Information of China (English)

    韩立毛; 鞠时光; 羊晶璟

    2011-01-01

    为了对用户访问过并感兴趣的网页进行准确描述,分析了对网页特征描述中涉及到的特征抽取范围以及特征词权重计算方法.根据"主题相关词非线性加权的方法"提出了一种改进特征词权重计算的方法,该方法不仅考虑了出现在标题中的特征词的重要性,而且利用非线性函数对特征词出现频率的处理思想,使得权重的计算更加准确.使用改进的特征权重计算方法提高了网页特征描述的准确性,从而提高了用户个性化搜索的效率.%In order to accurately describe the Web pages that users have visited and been interested in,it analyzes the scope of characteristic extraction and the method used to compute the weight of characteristic words in the page characteristic description.According to "A nonlinear weighted method of handling related topic words" ,an improved method based on the weight of characteristic words is raised.In this new method, it considers the importance of characteristic words in the title,and gives an idea using nonlinear-function to process the frequency of characteristic words,which will make the weight caiculation more precise and will increase the accuracy of the page characteristic description.As a result, the efficiency of the user's personalized searching can be enhanced.

  13. The Discussion of Data Paging Display of Web Application Based on Stored Procedure%基于存储过程的Web应用程序数据分页显示探析

    Institute of Scientific and Technical Information of China (English)

    徐好芹; 蒙皓兵

    2013-01-01

    It always involves the design of data query and browse, when we develop Web applications. In order to solve the problem with the constant growth of data quantity, it is necessary to adopt paging display strategy. This paper discussed a paging display strategy which encapsulated data processing by the Stored Procedure, this Stored Procedure has many advantages, such as commonality, high efficient, and so on.%  在开发Web应用程序的过程中,常常涉及到数据的查询和浏览方面的设计问题。为了解决数据量不断增长带来的问题,需要采用分页显示策略。为满足其需求,开发出一种使用存储过程封装数据分页显示的方法。该存储过程具有通用、执行效率高等优点。

  14. approach based on CURE algorithm of Web page segmentation and information extraction%基于CURE算法的网页分块及正文块提取研究

    Institute of Scientific and Technical Information of China (English)

    王超; 徐杰锋

    2012-01-01

    This paper discusses an approach based on CURE algorithm of Web pages segmentation and text extraction rules. The main idea is to add attributes to nodes of a standardization DOM tree to convert it into the extended DOM tree with the infor- mation node offset. Subsequently, we use the CURE algorithm to cluster information nodes. And each result of the cluster represent different block of the page. Finally, we extracts three nmin features of the text block and construct information weights formula which can distinguish text blocks.%研究基于CURE聚类的Web页面分块方法及正文块的提取规则。对页面DOM树增加节点属性,使其转换成为带有信息节点偏移量的扩展DOM树。利用CURE算法进行信息节点聚类,各个结果簇即代表页面的不同块。最后提取了正文块的三个主要特征,构造信息块权值公式,利用该公式识别正文块。

  15. 基于 Web的 Lucene全文搜索排序算法优化%Optimization of Lucene Ranking Pages Algorithm Based on Web

    Institute of Scientific and Technical Information of China (English)

    李臣龙; 张伟; 汪婧

    2015-01-01

    基于Lucene向量空间模型搜索的排序算法缺乏对自然语言语义理解的能力,直接有效的方法是根据用户个体对搜索文档的喜好,对选中的文档得分加权,由此提出Download-through Rank算法,对原有的排序算法进行了改进,设计并实现了个性化搜索引擎。实验证明,改进后的搜索排序算法能够有效提高信息检索的准确度。%The ranking pages algorithm based on Lucene VSM lacks the ability to understand natural lan-guage.The direct and effective method is to weight the selected document according to the user′s individ-ual preferences.It proposed an approach to improve the original ranking algorithm by Download-through Rank algorithm,and designed personalized search engine.Experimental results showed that the proposed sorting algorithm can improve the accuracy of the information retrieval.

  16. Web page translating software program The Hon-yaku; Hon'yaku software 'The hon'yaku

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-03-01

    A translating software program The Hon-yaku (the translator) Internet V4.0 has been developed, which provides an optimum means to help those who have to perform English-Japanese and Japanese-English translation on the Internet in collecting information from abroad, e-mailing to other countries, and chatting (exchanging of messages or dialogues) with friends overseas. When the Shortcut Read function is used, the user is instantly provided with the gist of the page on the screen even if it is full of long sentences. The program, furthermore, is equipped with a function of translating documents prepared using the state-of-the-art XML (eXtensible Markup Language) and PDF (Portable Document Format) technologies. Another translation software program The Hon-yaku Office V2.0 has also been developed, which adds a translating function to Microsoft(reg sign) Office 2000. Using this program, translation is completed upon clicking on the translation button on a Word, Excel, or Power Point(reg sign) screen. A template etc. are also provided, which simplifies the preparation of business letters in English. (translated by NEDO)

  17. 基于XML的网页结构化管理和数据检索%Structured Web Pages Management and Data Retrieving Based on Xml

    Institute of Scientific and Technical Information of China (English)

    黄晓; 钟琴

    2004-01-01

    近年来万维网(World Wide Web)的广泛使用为人们访问大量的数据源提供了一种开放式的途径,而影响web数据访问的一个主要原因就是web页面之间以及web页面内部的信息都缺乏结构化.为了能更加有效的检索web数据,就有必要实现web页面结构化的管理.该文所提出的结构化的管理web页面分为两步:①将超文本标记语言(html)转换为扩展标记语言(xml);②分级导航检索.

  18. PMD2HD--a web tool aligning a PubMed search results page with the local German Cancer Research Centre library collection.

    Science.gov (United States)

    Bohne-Lang, Andreas; Lang, Elke; Taube, Anke

    2005-06-27

    Web-based searching is the accepted contemporary mode of retrieving relevant literature, and retrieving as many full text articles as possible is a typical prerequisite for research success. In most cases only a proportion of references will be directly accessible as digital reprints through displayed links. A large number of references, however, have to be verified in library catalogues and, depending on their availability, are accessible as print holdings or by interlibrary loan request. The problem of verifying local print holdings from an initial retrieval set of citations can be solved using Z39.50, an ANSI protocol for interactively querying library information systems. Numerous systems include Z39.50 interfaces and therefore can process Z39.50 interactive requests. However, the programmed query interaction command structure is non-intuitive and inaccessible to the average biomedical researcher. For the typical user, it is necessary to implement the protocol within a tool that hides and handles Z39.50 syntax, presenting a comfortable user interface. PMD2HD is a web tool implementing Z39.50 to provide an appropriately functional and usable interface to integrate into the typical workflow that follows an initial PubMed literature search, providing users with an immediate asset to assist in the most tedious step in literature retrieval, checking for subscription holdings against a local online catalogue. PMD2HD can facilitate literature access considerably with respect to the time and cost of manual comparisons of search results with local catalogue holdings. The example presented in this article is related to the library system and collections of the German Cancer Research Centre. However, the PMD2HD software architecture and use of common Z39.50 protocol commands allow for transfer to a broad range of scientific libraries using Z39.50-compatible library information systems.

  19. A new algorithm to create a profile for users of web site benefiting from web usage mining

    Directory of Open Access Journals (Sweden)

    masomeh khabazfazli

    2015-11-01

    Full Text Available Upon integration of internet and its various applications and increase of internet pages, access to information in search engines becomes difficult. To solve this problem, web page recommendation systems are used. In this paper, recommender engine are improved and web usage mining methods are used for this purpose. In recommendation system, clustering was used for classification of users’ behavior. In fact, we implemented usage mining operation on the data related to each user for making its movement pattern. Then, web pages were recommended using neural network and markov model. So, performance of recommendation engine was improved using user’s movement patterns and clustering and neural network and Markov model, and obtained better results than other methods. To predict the data recovery quality on web, two factors including accuracy and coverage were used

  20. 一种改进的PageRank算法%An Improved PageRank Algorithm

    Institute of Scientific and Technical Information of China (English)

    王钟斐

    2011-01-01

    Aiming at the problems of Topic-drift and emphasizing on old web pages for PageRank algorithm, combined with anchor texts similarity and timing feedback factor, this article presents an improved algorithm STPR, and analyzs STPR algorithm by experiment. First this article compares the traditional PageRank algorithm and adding anchor text similarity PageRank algorithm, results show that by adding anchor text similarity PageRank algorithm helps to reduce the occurrence of the phenomenon of topic-drift. Second this article compares adding anchor text similarity PageRank algorithm and STPR algorithm, results show that STPR algorithm not only reduce topic-drift phenomenon, but also make up PageRank value for new web pages.%针对PageRank算法存在主题漂移以及偏重旧网页的问题,结合锚文本相似度和时间反馈因子提出了一种PageRank改进算法STPR,并对STPR算法进行实验分析.先比较了传统PageRank算法与加入锚文本相似度的PageRank算法,结果表明加入锚文本相似度的PageRank算法有利于减少主题漂移现象的发生;其次比较了加入锚文本相似度的PageRank算法与STPR算法,结果表明STPR算法不但减少了主题漂移现象,而且还弥补了新网页的PageRank值.

  1. FPC: Fast Incremental Clustering for Large Scale Web Pages%FPC:大规模网页的快速增量聚类

    Institute of Scientific and Technical Information of China (English)

    2016-01-01

    面向结构相似的网页聚类是网络数据挖掘的一项重要技术.传统的网页聚类没有给出网页簇中心的表示方式,在计算点簇间和簇簇间相似度时需要计算多个点对的相似度,这种聚类算法一般比使用簇中心的聚类算法慢,难以满足大规模快速增量聚类的需求.针对此问题,该文提出一种快速增量网页聚类方法FPC(Fast Page Clustering).在该方法中,先提出一种新的计算网页相似度的方法,其计算速度是简单树匹配算法的500倍;给出一种网页簇中心的表示方式,在此基础上使用Kmeans算法的一个变种MKmeans(Merge-Kmeans)进行聚类,在聚类算法层面上提高效率;使用局部敏感哈希技术,从数量庞大的网页类集中快速找出最相似的类,在增量合并层面上提高效率.

  2. On Structure Characteristics of Educational Web-page Based on Text-picture and Its Design Principles%文本-图片类教育网页的结构特征与设计原则——基于宁波大学的眼动实验研究

    Institute of Scientific and Technical Information of China (English)

    刘世清; 周鹏

    2011-01-01

    通过对浏览文本-图片类教育网页的注视时间、注视点个数等视觉参数进行眼动实验研究发现,在文本-图片类网页中,左图右文结构对文本区的注视时间、注视点个数最多;上图下文结构则对图片区的注视时间、注视点个数最多。由此,教育网页的界面设计在从经验型向科学型转变中,当网页界面以文为主时应采用左图右文优选原则和上图下文避免原则;当网页界面以图为主时应采用上图下文优选原则和左图右文避免原则;当网页界面以图文为主的,要坚持网页界面的图文兼顾原则。%With the eye tracking research experiment on the visual parameters of the fixation duration and the number of fixation on the viewing of text-picture based educational web-pages, it is found out that in the text-picture based web-pages, the structure with left-picture and right-text has gained the most fixation duration and the number of fixation, while the structure with up-picture and down-text has gained the most fixation duration and the number of fixation. Therefore, in the transformation from experiment orientation to science orientation of the design of educational web-page, the text-dominated web-page should adopt optimizing principle of left-picture and right-text, the picture-dominated web-page should avoid adopting the principle of left-picture and right-text; the picture-dominated web-page should adopt optimizing principle of up-picture and down-text, the text-dominated web-page should avoid adopting the principle of up-picture and down-text, and at the same time the principle of appropriate balance between picture and text should be adhered in the design of web-page.

  3. Modified Weighted PageRank Algorithm using Time Spent on Links

    Directory of Open Access Journals (Sweden)

    Priyanka Bauddha

    2014-09-01

    Full Text Available With dynamic growth and increasing data on the web, it is very difficult to find relevant information for a user. Large numbers of pages are returned by search engine in response of user’s query. The ranking algorithms have been developed to prioritize the search results so that more relevant pages are displayed at the top. Various ranking algorithms based on web structure mining and web usage mining such as PageRank, Weighted PageRank, PageRank with VOL and Weighted PageRank with VOL have been developed but they are not able to endure with the time spent by user on a particular web page. If user is conferring more time on a web page that signifies the page is more relevant to user. The proposed algorithm consolidates time spent with the Weighted PageRank using Visit of Links.

  4. An atlas of classification. Signage between open shelves, the Web and the catalogue

    Directory of Open Access Journals (Sweden)

    Andrea Fabbrizzi

    2014-05-01

    Questa segnaletica è fondata sulla comunicazione cross-mediale, e integra le modalità comunicative della biblioteca a vari livelli, sia nel contesto dello stesso medium, sia tra media diversi: tra i cartelli sulle testate degli scaffali, tra questi cartelli e il sito web della biblioteca, tra il sito web e il catalogo. Per questo sistema integrato sono particolarmente adatti i dispositivi mobili come i tablet e gli smartphone, perché danno la possibilità di accedere al Web mentre ci si muove tra gli scaffali. Il collegamento diretto tra gli scaffali aperti classificati e il catalogo è reso possibile dai codici QR stampati sui cartelli.

  5. 图式视角下的高校网页翻译%Colleges and universities' web page translation under the perspective of schema

    Institute of Scientific and Technical Information of China (English)

    潘彬彬

    2015-01-01

    图式形成于对外部世界的信息处理过程中,是世界知识的精神结构.中西方受众不同的图式要求高校网页翻译以受众为中心,保留对应图式,化解图式冲突,填补图式缺省,从而将源语所承载的信息最大限度传入译入语中并有效影响受众,实现高校外宣翻译目的.%The process of information processing pattern formed in the outside world, is the world's knowledge structure. Web translation between Chinese and western audience different schema requires colleges and universities to the audience as the center, keep the corresponding schema, dissolve the schema conflict, fill the schema of the default, thus the information carried by the source language is maximum incoming in the target language and effective influence audience, realize the heralded the translation purpose.

  6. 改进多分类器集成 AdaBoost算法的 Web主题分类%WEB TOPIC CLASSIFICATION BASED ON MODIFIED MULTI-CLASSIFIER INTEGRATION ADABOOST ALGORITHM

    Institute of Scientific and Technical Information of China (English)

    伍杰华; 倪振声

    2013-01-01

    现有的Web主题分类算法一般基于单一模型构建或者仅仅把多个单一模型简单叠加进行决策。针对该问题,提出一种基于多分类器集成的改进AdaBoost算法的Web主题分类方法。算法先采用VIPS算法获取页面分块并获取其视觉特征和文本特征,根据每一类特征的维度分别训练弱分类器,然后计算其对应的错误率,修改错误判别的拒绝策略,从而针对不同特征产生相应的最优分类器,最后对两类最优分类器级联决策。实验结果表明,该方法能提高AdaBoost算法对复杂Web主题信息的分类准确率,同时也为Web主题分类领域的研究提供一种新的方案。%Current Web topic classification algorithms are generally constructed based on single model or merely superimpose the multiple single model for decision-making.In light of the problem, we propose a new Web topic classification method which is based on the modified multi-classifier integration AdaBoost algorithm .Firstly, the method uses VIPS algorithm to acquire page blocks as well as their visual and text features, and trains weak classifier on the basis of the dimension of each feature; then, the algorithm calculates its corresponding error rate and modifies the refusal strategies of error discrimination , so that generates the corresponding optimal classifier for different features ;finally it performs cascading decision-making on two kind of optimal classifiers .Experimental results demonstrate that the method can improve the classification precision of AdaBoost on complex Web topic information , and at the same time it also provides a kind of new scheme for research on Web topic classification field .

  7. Classification

    Science.gov (United States)

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  8. Web Personalization of Indian e-Commerce Websites using Classification Methodologies

    Directory of Open Access Journals (Sweden)

    Agarwal Devendera

    2010-11-01

    Full Text Available The paper highlights the classification methodologies using Bayesian Rule for Indian e-commerce websites. It deals with generating cluster of users having fraudulent intentions. Secondly, it also focuses on Bayesian Ontology Requirement for efficient Possibilistic Outcomes.

  9. Research and Practice on Teaching Reformation of Web-site Design and Web-page Making Based on Project Driving%基于项目驱动的《网页设计与制作》课程教学的改革与实践

    Institute of Scientific and Technical Information of China (English)

    吴海生

    2013-01-01

    This paper analysis the teaching status of Web-site Design and Web-page Making, puts forward the related reform of teaching methods and concrete measures, discusses the concrete application of project driving teaching mode in detail, and achieves good results.%该文通过对我院《网页设计与制作》课程教学现状的分析,提出相关教学改革的手段和具体措施,详细论述了项目驱动的教学模式在该课程教学中的具体应用,取得了良好的效果。

  10. Classification

    DEFF Research Database (Denmark)

    Hjørland, Birger

    2017-01-01

    This article presents and discusses definitions of the term “classification” and the related concepts “Concept/conceptualization,”“categorization,” “ordering,” “taxonomy” and “typology.” It further presents and discusses theories of classification including the influences of Aristotle...... and Wittgenstein. It presents different views on forming classes, including logical division, numerical taxonomy, historical classification, hermeneutical and pragmatic/critical views. Finally, issues related to artificial versus natural classification and taxonomic monism versus taxonomic pluralism are briefly...

  11. Information Collection and Extraction of Web Pages with Public Opinion Search Engine%舆情搜索引擎中网页信息的采集与抽取研究

    Institute of Scientific and Technical Information of China (English)

    王兰成

    2011-01-01

    It is usually different between Internet public opinion search engine and information search. The first is tobe gathering and extracting data within the page depth to the site and effective. It puts forward many new research content and methods for information area. Web information extraction in the templates and page analysis of two ways, based on natural language processing, Ontology extraction and wrapper induction method based on the analysis have been researched. The wrapper induction based manner and in the rule generation module used an expert model has been designed. It improves the accuracy of public opinion and the quality of search engines.%网络舆情搜索引擎与通常的网络信息搜索不同,其最终结果要深入到站点和页面内部采集与抽取有效数据,给情报界提出了许多新的研究内容和方法.在对网页信息抽取的模板和页面分析两种方式、基于自然语言处理、包装器归纳和Ontology抽取方法的分析基础上,使用基于包装器归纳方式并在规则生成模块中采用专家模式,设计一种基于样本学习的新闻抽取方法,通过人工分析网页源代码制定和修改抽取规则,然后根据抽取规则进行信息自动抽取,以提高舆情搜索引擎的精度和质量.

  12. Yapay Sinir Ağları ile Web İçeriklerini Sınıflandırma / Web Content Classification Using Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Esra Nergis Güven

    2008-04-01

    Full Text Available [Turkish abstract]Internet’in hızlı gelişmesi ve yaygınlaşması elektronik ortamda iş ve işlemleri hızlandırmış ve kolaylaştırmıştır. Elektronik ortamlarda depolanan, taşınan ve işlenen bilgilerin boyutunun her geçen gün artması ise bilgiye erişim ile ilgili birçok problemi de beraberinde getirmiştir. Kullanıcıların elektronik ortamda sunulan bilgilere erişmelerindeki hız ve doğruluk gereksinimi nedeniyle, bu ortamlarda tutulan bilgileri sınıflandırma ve kategorilere ayırma yaklaşımlarına ihtiyaç duyulmaktadır. Sayıları milyonun üzerinde olan arama motorlarının, kullanıcıların doğru bilgilere kısa sürede ulaşmasını sağlaması için her geçen gün yeni yaklaşımlar ile desteklenmesi gerekmektedir. Bu çalışmada, web sayfalarının belirlenen konulara göre sınıflandırılabilmesi için, Çok Katmanlı (MLP yapay sinir ağı modeli kullanılmıştır. Özellik vektörü içeriğinin seçimi, yapay sinir ağının eğitilmesi ve son olarak web sayfalarının doğru kategorize edilmesi için bir yazılım geliştirilmiştir. Bu zeki yaklaşımın, elektronik ortamlarda bilgilerin kolaylıkla ve yüksek doğrulukla sınıflandırılması, web ortamlarında doğru içeriğe ulaşılması ve birçok güvenlik açığının giderilmesine katkılar sağlayacağı değerlendirilmektedir. [English abstract]Recent developments and widespread usage of the Internet have made business and processes to be completed faster and easily in electronic media. The increasing size of the stored, transferred and processed data brings many problems that affect access to information on the Web. Because of users’ need get to access to the information in electronic environment quickly, correctly and appropriately, different methods of classification and categorization of data are strictly needed. Millions of search engines should be supported with new approaches every day in order for users to get access to

  13. The matrix method to calculate page rank

    Directory of Open Access Journals (Sweden)

    H. Barboucha, M. Nasri

    2014-06-01

    Full Text Available Choosing the right keywords is relatively easy, whereas getting a high PageRank is more complicated. The index Page Rank is what defines the position in the result pages of search engines (for Google of course, but the other engines are now using more or less the same kind of algorithm. It is therefore very important to understand how this type of algorithm functions to hope to appear on the first page of results (the only page read in 95 % of cases or at least be among the first. We propose in this paper to clarify the operation of this algorithm using a matrix method and a JavaScript program enabling to experience this type of analysis. It is of course a simplified version, but it can add value to the website and achieve a high ranking in the search results and reach a larger customer base. The interest is to disclose an algorithm to calculate the relevance of each page. This is in fact a mathematical algorithm based on a web graph. This graph is formed of all the web pages that are modeled by nodes, and hyperlinks that are modeled by arcs.

  14. Automatic Generation of Data Types for Classification of Deep Web Sources

    Energy Technology Data Exchange (ETDEWEB)

    Ngu, A H; Buttler, D J; Critchlow, T J

    2005-02-14

    A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automatic generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.

  15. 基于统计分词的中文网页分类%Chinese Web Page Classification Based On Statistical Word Segmentation

    Institute of Scientific and Technical Information of China (English)

    黄科; 马少平

    2002-01-01

    本文将基于统计的二元分词方法应用于中文网页分类,实现了在事先没有词表的情况下通过统计构造二字词词表,从而根据网页中的文本进行分词,进而进行网页的分类.因特网上不同类型和来源的文本内容用词风格和类型存在相当的差别,新词不断出现,而且易于获得大量的同类型文本作为训练语料.这些都为实现统计分词提供了条件.本文通过试验测试了统计分词构造二字词表用于中文网页分类的效果.试验表明,在统计阈值选择合适的时候,通过构建的词表进行分词进而进行网页分类,能有效地提高网页分类的分类精度.此外,本文还分析了单字和分词对于文本分类的不同影响及其原因.

  16. Web Similarity

    NARCIS (Netherlands)

    Cohen, A.R.; Vitányi, P.M.B.

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale fr

  17. DESIGN AND IMPLEMENTATION OF IP QUEUE-BASED REAL-TIME WEB PAGE FILTRATION SYSTEM%基于IP Queue的实时网页过滤系统的设计与实现

    Institute of Scientific and Technical Information of China (English)

    周聚; 朱巧明; 李培峰; 刘钊

    2011-01-01

    This paper analysed the implementation technology of IP Queue mechanism, HTTP request packet, HTTP response packet and the characteristics of IP packet. Based on that,a real-time web page filtration system is realized. It contains the request packet filtering based on IP address and URL and the response packet filtering based on keywords. This system runs in a specific billing gateway system simultaneously, enhances the monitoring ability of the gateway, and provides beneficial reference to the enhancement of network security of similar products,especially in user mode firewalls and gateway monitoring,etc.%在分析了IP Queue机制的实现技术、HTTP请求报文和响应报文,以及IP数据包的相关特征的基础上,实现了基于IP地址、URL的请求报文过滤以及基于关键词的响应报文过滤的实时网页过滤系统.该系统同时运行于一个具体的网关计费系统,提高了网关的监控能力,为增强同类产品的网络安全尤其是在用户态防火墙和网关监控等方面提供了有益的参考.

  18. ONTOPARK: ONTOLOGY BASED PAGE RANKING FRAMEWORK USING RESOURCE DESCRIPTION FRAMEWORK

    Directory of Open Access Journals (Sweden)

    S. Yasodha

    2014-01-01

    Full Text Available Traditional search engines like Google and Yahoo fail to rank the relevant information for users’ query. This is because such search engines rely on keywords for searching and they fail to consider the semantics of the query. More sophisticated methods that do provide the relevant information for the query is the need of the time. The Semantic Web that stores metadata as ontology could be used to solve this problem. The major drawback of the PageRank algorithm of Google is that ranking is based not only on the page ranks produced but also on the number of hits to the Web page. This paved way for illegitimate means of boosting page ranks. As a result, Web pages whose page rank is zero are also ranked in top-order. This drawback of PageRank algorithm motivated us to contribute to the Web community to provide semantic search results. So we propose ONTOPARK, an ontology based framework for ranking Web pages. The proposed framework combines the Vector Space Model of Information Retrieval with Ontology. The framework constructs semantically annotated Resource Description Framework (RDF files which form the RDF knowledgebase for each query. The proposed framework has been evaluated by two measures, precision and recall. The proposed framework improves the precision of both single-word and multi-word queries which infer that replacing Web database by semantic knowledgebase will definitely improve the quality of search. The surfing time of the surfers will also be minimized.

  19. Full page insight

    DEFF Research Database (Denmark)

    Cortsen, Rikke Platz

    2014-01-01

    Alan Moore and his collaborating artists often manipulate time and space by drawing upon the formal elements of comics and making alternative constellations. This article looks at an element that is used frequently in comics of all kinds – the full page – and discusses how it helps shape spatio......, something that it shares with the full page in comics. Through an analysis of several full pages from Moore titles like Swamp Thing, From Hell, Watchmen and Promethea, it is made clear why the full page provides an apt vehicle for an apocalypse in comics....

  20. Full page insight

    DEFF Research Database (Denmark)

    Cortsen, Rikke Platz

    2014-01-01

    Alan Moore and his collaborating artists often manipulate time and space by drawing upon the formal elements of comics and making alternative constellations. This article looks at an element that is used frequently in comics of all kinds – the full page – and discusses how it helps shape spatio-t......, something that it shares with the full page in comics. Through an analysis of several full pages from Moore titles like Swamp Thing, From Hell, Watchmen and Promethea, it is made clear why the full page provides an apt vehicle for an apocalypse in comics....

  1. Remote Sensing Image Analysis Without Expert Knowledge - A Web-Based Classification Tool On Top of Taverna Workflow Management System

    Science.gov (United States)

    Selsam, Peter; Schwartze, Christian

    2016-10-01

    Providing software solutions via internet has been known for quite some time and is now an increasing trend marketed as "software as a service". A lot of business units accept the new methods and streamlined IT strategies by offering web-based infrastructures for external software usage - but geospatial applications featuring very specialized services or functionalities on demand are still rare. Originally applied in desktop environments, the ILMSimage tool for remote sensing image analysis and classification was modified in its communicating structures and enabled for running on a high-power server and benefiting from Tavema software. On top, a GIS-like and web-based user interface guides the user through the different steps in ILMSimage. ILMSimage combines object oriented image segmentation with pattern recognition features. Basic image elements form a construction set to model for large image objects with diverse and complex appearance. There is no need for the user to set up detailed object definitions. Training is done by delineating one or more typical examples (templates) of the desired object using a simple vector polygon. The template can be large and does not need to be homogeneous. The template is completely independent from the segmentation. The object definition is done completely by the software.

  2. Larry Page on Google

    Institute of Scientific and Technical Information of China (English)

    Miguel Helft

    2012-01-01

    Last month, Larry Page sat down with Fortune Senior Writer Miguel Helft for a lengthy interview for a forthcoming Fortune magazine article. It was only Page' s second wide-ranging conversation with a print publication since becoming CEO of Google in April 2011.

  3. Cuckoo Hashing with Pages

    CERN Document Server

    Dietzfelbinger, Martin; Rink, Michael

    2011-01-01

    Although cuckoo hashing has significant applications in both theoretical and practical settings, a relevant downside is that it requires lookups to multiple locations. In many settings, where lookups are expensive, cuckoo hashing becomes a less compelling alternative. One such standard setting is when memory is arranged in large pages, and a major cost is the number of page accesses. We propose the study of cuckoo hashing with pages, advocating approaches where each key has several possible locations, or cells, on a single page, and additional choices on a second backup page. We show experimentally that with k cell choices on one page and a single backup cell choice, one can achieve nearly the same loads as when each key has k+1 random cells to choose from, with most lookups requiring just one page access, even when keys are placed online using a simple algorithm. While our results are currently experimental, they suggest several interesting new open theoretical questions for cuckoo hashing with pages.

  4. Page Styles on steroids

    DEFF Research Database (Denmark)

    Madsen, Lars

    2008-01-01

    Designing a page style has long been a pain for novice users. Some parts are easy; others need strong LATEX knowledge. In this article we will present the memoir way of dealing with page styles, including new code added to the recent version of memoir that will reduce the pain to a mild annoyance...

  5. Modeling clicks beyond the first result page

    NARCIS (Netherlands)

    Chuklin, A.; Serdyukov, P.; de Rijke, M.

    2013-01-01

    Most modern web search engines yield a list of documents of a fixed length (usually 10) in response to a user query. The next ten search results are usually available in one click. These documents either replace the current result page or are appended to the end. Hence, in order to examine more

  6. Modeling clicks beyond the first result page

    NARCIS (Netherlands)

    Chuklin, A.; Serdyukov, P.; de Rijke, M.

    2013-01-01

    Most modern web search engines yield a list of documents of a fixed length (usually 10) in response to a user query. The next ten search results are usually available in one click. These documents either replace the current result page or are appended to the end. Hence, in order to examine more docu

  7. New WWW Pages

    CERN Multimedia

    Pommes, K

    New WWW pages have been created in order to provide easy access to the many activities and pertaining information of the ATLAS Technical Coordination. The main entry point is available on the ATLAS Collaboration page by clicking the Technical Coordination link which leads to the page shown in the following picture. Each button links to a page listing all tasks of the corresponding activity, the responsible task leaders, schedules, work-packages, and action lists, etc... The "ATLAS Documentation Center" button will present the pop-up window shown in the next figure: Besides linking to the Technical Coordination Activities, this page provides direct access to the tools for Project Progress Tracking (PPT) and Engineering Data Management (EDMS), as well as to the main topics being coordinated by the Technical Coordination.

  8. Value of Information Web Application

    Science.gov (United States)

    2015-04-01

    2.1 Demographics The initial page that the user encounters when accessing the VoI web application is Demographics (Fig. 1). On this page , the user...deck is empty, the web application sets up the next deck for the user or sends the user to the Results page if all decks have been played. The user...confirmation that the submittal was successful or not successful. This ends the users’ interaction with the web application. 5 Fig. 4 Results page

  9. Geographic Information Systems and Web Page Development

    Science.gov (United States)

    Reynolds, Justin

    2004-01-01

    The Facilities Engineering and Architectural Branch is responsible for the design and maintenance of buildings, laboratories, and civil structures. In order to improve efficiency and quality, the FEAB has dedicated itself to establishing a data infrastructure based on Geographic Information Systems, GIS. The value of GIS was explained in an article dating back to 1980 entitled "Need for a Multipurpose Cadastre" which stated, "There is a critical need for a better land-information system in the United States to improve land-conveyance procedures, furnish a basis for equitable taxation, and provide much-needed information for resource management and environmental planning." Scientists and engineers both point to GIS as the solution. What is GIS? According to most text books, Geographic Information Systems is a class of software that stores, manages, and analyzes mapable features on, above, or below the surface of the earth. GIS software is basically database management software to the management of spatial data and information. Simply put, Geographic Information Systems manage, analyze, chart, graph, and map spatial information. GIS can be broken down into two main categories, urban GIS and natural resource GIS. Further still, natural resource GIS can be broken down into six sub-categories, agriculture, forestry, wildlife, catchment management, archaeology, and geology/mining. Agriculture GIS has several applications, such as agricultural capability analysis, land conservation, market analysis, or whole farming planning. Forestry GIs can be used for timber assessment and management, harvest scheduling and planning, environmental impact assessment, and pest management. GIS when used in wildlife applications enables the user to assess and manage habitats, identify and track endangered and rare species, and monitor impact assessment.

  10. Native Mobile Ap or Mobile Web Page?

    OpenAIRE

    2013-01-01

    This project is designed to greatly enhance the lines of communication between the pollinator industry and the pesticide applicator community. It builds on the DriftWatch.org Pesticide Sensitive Crop Registry (www.driftwatch.org), which currently covers nine states, with a customized map for each state (Colorado, Indiana, Illinois, Michigan, Missouri, Minnesota, Montana, Nebraska, and Wisconsin).

  11. Processing Web Pages for College English Reading

    Institute of Scientific and Technical Information of China (English)

    WangLixin; WangYang; YangMuyun

    2004-01-01

    ELT in the university is to further cultivate students' ability in listening, speaking, reading, writing as well as translation, so that they could effectively perform communication in English. Among the five, reading is now giving his chair to speaking: the circle is giving more and more emphasis on the capability of face-to-face communication.

  12. 搜索引擎PageRank算法的改进%Improvement of PageRank Algorithm for Search Engine

    Institute of Scientific and Technical Information of China (English)

    杨劲松; 凌培亮

    2009-01-01

    In order to solve the problems in information retrieval when enterprise making rapid decision, this paper proposes an improved PageRank algorithm. Considering the time factor by Web page, it distributes the forward link different PageRank value based on the proportion by the similarity analysis between anchor text and Web page text. The final PageRank value is more suitable for topic-specific search engine and keeps simplicity of algorithm. Experimental result shows that the improved algorithm can effectively reduce the phenomenon of topic-drift and enhance the PageRank value of new Web page.%为了解决企业快速决策时信息检索的问题,提出一种改进的PageRank算法.在考虑网页产生时间因素的同时,通过锚文本与网页主题的相似度分析按权重分配网页各正向链接PageRank值,产生的PageRank值更贴合主题搜索引擎的要求,并保持算法的简洁性.实验结果证明该改进算法能有效减少主题漂移现象,恰当提升新网页PageRank值.

  13. EPA Web Taxonomy

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA's Web Taxonomy is a faceted hierarchical vocabulary used to tag web pages with terms from a controlled vocabulary. Tagging enables search and discovery of EPA's...

  14. 16 CFR 1130.8 - Requirements for Web site registration or alternative e-mail registration.

    Science.gov (United States)

    2010-01-01

    ... registration. (a) Link to registration page. The manufacturer's Web site, or other Web site established for the... web page that goes directly to “Product Registration.” (b) Purpose statement. The registration page... registration page. The Web site registration page shall request only the consumer's name, address,...

  15. Usare WebDewey

    OpenAIRE

    Baldi, Paolo

    2016-01-01

    This presentation shows how to use the WebDewey tool. Features of WebDewey. Italian WebDewey compared with American WebDewey. Querying Italian WebDewey. Italian WebDewey and MARC21. Italian WebDewey and UNIMARC. Numbers, captions, "equivalente verbale": Dewey decimal classification in Italian catalogues. Italian WebDewey and Nuovo soggettario. Italian WebDewey and LCSH. Italian WebDewey compared with printed version of Italian Dewey Classification (22. edition): advantages and disadvantages o...

  16. Universal Emergence of PageRank

    CERN Document Server

    Frahm, K M; Shepelyansky, D L

    2011-01-01

    The PageRank algorithm enables to rank the nodes of a network through a specific eigenvector of the Google matrix, using a damping parameter $\\alpha \\in ]0,1[$. Using extensive numerical simulations of large web networks, we determine numerically and analytically the universal features of PageRank vector at its emergence when $\\alpha \\rightarrow 1$. The whole network can be divided into a core part and a group of invariant subspaces. For $ \\alpha \\rightarrow 1$ the PageRank converges to a universal power law distribution on the invariant subspaces whose size distribution also follows a universal power law. The convergence of PageRank at $ \\alpha \\rightarrow 1$ is controlled by eigenvalues of the core part of the Google matrix which are exponentially close to unity leading to large relaxation times as for example in spin glasses.

  17. Universal emergence of PageRank

    Energy Technology Data Exchange (ETDEWEB)

    Frahm, K M; Georgeot, B; Shepelyansky, D L, E-mail: frahm@irsamc.ups-tlse.fr, E-mail: georgeot@irsamc.ups-tlse.fr, E-mail: dima@irsamc.ups-tlse.fr [Laboratoire de Physique Theorique du CNRS, IRSAMC, Universite de Toulouse, UPS, 31062 Toulouse (France)

    2011-11-18

    The PageRank algorithm enables us to rank the nodes of a network through a specific eigenvector of the Google matrix, using a damping parameter {alpha} Element-Of ]0, 1[. Using extensive numerical simulations of large web networks, with a special accent on British University networks, we determine numerically and analytically the universal features of the PageRank vector at its emergence when {alpha} {yields} 1. The whole network can be divided into a core part and a group of invariant subspaces. For {alpha} {yields} 1, PageRank converges to a universal power-law distribution on the invariant subspaces whose size distribution also follows a universal power law. The convergence of PageRank at {alpha} {yields} 1 is controlled by eigenvalues of the core part of the Google matrix, which are extremely close to unity, leading to large relaxation times as, for example, in spin glasses. (paper)

  18. 基于网络协议和页面特征的物理设备发现%A Searching Physical Devices Method Based on Internet Protocols and Web Page Features

    Institute of Scientific and Technical Information of China (English)

    冯健飞; 张毅; 马迪; 张京京

    2016-01-01

    互联网存在着大量网络摄像头、PLC、传感器等物理设备,对这些设备进行自动发现有助于了解其分布和部署情况;从人机物多域融合的角度表示物理设备,有助于全面刻画物理设备,并为跨域攻击分析提供支持.文中提出一种基于网络协议报文和Web页面特征在互联网中发现物理设备的方法.该方法主要通过HTTP、SNMP和PPTP协议的握手报文头部信息和物理设备访问控制Web页面的结构特征发现物理对象并获取物理对象的基本信息,然后通过预置的产品信息库充分感知设备硬件信息,通过IP信息库获知设备物理地点和社会域属性,从而实现对物理对象的人机物多域融合分析.最后利用文中所提出的方法,开发了物理对象感知和分析系统NetThing,并对运用文中方法获取的物理设备数据进行了分析和验证.%There are many physical devices in the Internet,including webcams,PLC,sensors etc. Searching and finding these physical de-vices helps to know more about their distribution and deployment. Describing physical devices in "social cyber physical multi-domain"model is also good for fully depicting them and analyzing possible existence of cross-domain threats. A method for finding physical de-vices in the Internet based on the protocols' datagrams and Web page features is proposed. This method mainly uses the shake hands data-grams of HTTP,SNMP,PPTP and the features of Web pages to find physical devices and get their basic information. Then it expands hardware information of the devices through the products information base,and social domain information,such as location information, through the IP information base. At last,a proto type system named NetThing is developed using method proposed,and the data of experi-ments is analyzed and verified.

  19. Economic page turners

    OpenAIRE

    Frank, Björn

    2011-01-01

    Economic page turners like Freakonomics are well written and there is much to be learned from them - not only about economics, but also about writing techniques. Their authors know how to build up suspense, i.e., they make readers want to know what comes. An uncountable number of pages in books and magazines are filled with advice on writing reportages or suspense novels. While many of the tips are specific to the respective genres, some carry over to economic page turners in an instructive w...

  20. An Efficient PageRank Approach for Urban Traffic Optimization

    Directory of Open Access Journals (Sweden)

    Florin Pop

    2012-01-01

    to determine optimal decisions for each traffic light, based on the solution given by Larry Page for page ranking in Web environment (Page et al. (1999. Our approach is similar with work presented by Sheng-Chung et al. (2009 and Yousef et al. (2010. We consider that the traffic lights are controlled by servers and a score for each road is computed based on efficient PageRank approach and is used in cost function to determine optimal decisions. We demonstrate that the cumulative contribution of each car in the traffic respects the main constrain of PageRank approach, preserving all the properties of matrix consider in our model.

  1. Personal home pages as an information resource

    Directory of Open Access Journals (Sweden)

    Shant Narsesian

    2004-12-01

    Full Text Available Nowadays, for many people, the World Wide Web (WWW is the first place to go to look something up, to find that bit of information. However, even though people have their favourite sites, and their favourite search engines, they often seem to miss that bit of information. This could very well be because it is hiding on a small, unpopular, enthusiast's Personal Home Page. The author believes that there is more information on the Web than that which one will find on the major, "commercial-style" sites. Hence, this paper looks at the possibility of using Personal Home Pages (PHP as an information resource, not only for the academic, but the web-surfing world in general.

  2. Automatic acquisition and classification system for agricultural network information based on Web data%基于Web数据的农业网络信息自动采集与分类系统

    Institute of Scientific and Technical Information of China (English)

    段青玲; 魏芳芳; 张磊; 肖晓琰

    2016-01-01

    The purpose of this study is to obtain agricultural web information efficiently, and to provide users with personalized service through the integration of agricultural resources scattered in different sites and the fusion of heterogeneous environmental data. The research in this paper has improved some key information technologies, which are agricultural web data acquisition and extraction technologies, text classification based on support vector machine (SVM) and heterogeneous data collection based on the Internet of things (IOT). We first add quality target seed site into the system, and get website URL (uniform resource locator) and category information. The web crawler program can save original pages. The de-noised web page can be obtained through HTML parser and regular expressions, which create custom Node Filter objects. Therefore, the system builds a document object model (DOM) tree before digging out data area. According to filtering rules, the target data area can be identified from a plurality of data regions with repeated patterns. Next, the structured data can be extracted after property segmentation. Secondly, we construct linear SVM classification model, and realize agricultural text classification automatically. The procedures of our model include 4 steps. First of all, we use segment tool ICTCLAS to carry out the word segment and part-of-speech (POS) tagging, followed by combining agricultural key dictionary and document frequency adjustment rule to choose feature words, and building a feature vector and calculating inverse document frequency (IDF) weight value for feature words; lastly we design adaptive classifier of SVM algorithm. Finally, the perception data of different format collected by the sensor are transmitted to the designated server as the source data through the wireless sensor network. Relational database in accordance with specified acquisition frequency can be achieved through data conversion and data filtering. The key step of

  3. Automatic classification of Deep Web sources%Deep Web数据源自动分类

    Institute of Scientific and Technical Information of China (English)

    金灵芝; 王小玲; 朱守中

    2009-01-01

    随着World Wide Web(WWW)的飞速发展,Deep Web中蕴含了海量的可供访问的信息,并且还在迅速地增长.其中大部分的Deep Web是结构化的,把这些结构化的Deep Web按其领域进行分类,是Deep Web集成查询接口生成的一个非常重要的步骤.文中提出了一种利用朴素贝叶斯分类的方法,并通过实验证明了其有效性.

  4. Prototype of web-based database of surface wave investigation results for site classification

    Science.gov (United States)

    Hayashi, K.; Cakir, R.; Martin, A. J.; Craig, M. S.; Lorenzo, J. M.

    2016-12-01

    As active and passive surface wave methods are getting popular for evaluating site response of earthquake ground motion, demand on the development of database for investigation results is also increasing. Seismic ground motion not only depends on 1D velocity structure but also on 2D and 3D structures so that spatial information of S-wave velocity must be considered in ground motion prediction. The database can support to construct 2D and 3D underground models. Inversion of surface wave processing is essentially non-unique so that other information must be combined into the processing. The database of existed geophysical, geological and geotechnical investigation results can provide indispensable information to improve the accuracy and reliability of investigations. Most investigations, however, are carried out by individual organizations and investigation results are rarely stored in the unified and organized database. To study and discuss appropriate database and digital standard format for the surface wave investigations, we developed a prototype of web-based database to store observed data and processing results of surface wave investigations that we have performed at more than 400 sites in U.S. and Japan. The database was constructed on a web server using MySQL and PHP so that users can access to the database through the internet from anywhere with any device. All data is registered in the database with location and users can search geophysical data through Google Map. The database stores dispersion curves, horizontal to vertical spectral ratio and S-wave velocity profiles at each site that was saved in XML files as digital data so that user can review and reuse them. The database also stores a published 3D deep basin and crustal structure and user can refer it during the processing of surface wave data.

  5. Web Fuzzy Clustering and a Case Study

    Institute of Scientific and Technical Information of China (English)

    LIU Mao-fu; HE Jing; HE Yan-xiang; HU Hui-jun

    2004-01-01

    We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering.

  6. Web search: how the Web has changed information retrieval

    Directory of Open Access Journals (Sweden)

    Brooks Terrence A.

    2003-01-01

    Full Text Available Topical metadata are simultaneously hailed as building blocks of the semantic Web and derogated as spam. The significance of the metadata controversy depends on the technological appropriateness of adding them to Web pages. A survey of Web technology suggests that Web pages are both transient and volatile: poor hosts of topical metadata. A more supportive environment exists in the closed Web. The vast majority of Web pages, however, exist in the open Web, an environment that challenges the application of legacy information retrieval concepts and methods.

  7. The ICAP (Interactive Course Assignment Pages Publishing System

    Directory of Open Access Journals (Sweden)

    Kim Griggs

    2008-03-01

    Full Text Available The ICAP publishing system is an open source custom content management system that enables librarians to easily and quickly create and manage library help pages for course assignments (ICAPs, without requiring knowledge of HTML or other web technologies. The system's unique features include an emphasis on collaboration and content reuse and an easy-to-use interface that includes in-line help, simple forms and drag and drop functionality. The system generates dynamic, attractive course assignment pages that blend Web 2.0 features with traditional library resources, and makes the pages easier to find by providing a central web page for the course assignment pages. As of December 2007, the code is available as free, open-source software under the GNU General Public License.

  8. An Improved Approach to the PageRank Problems

    Directory of Open Access Journals (Sweden)

    Yue Xie

    2013-01-01

    Full Text Available We introduce a partition of the web pages particularly suited to the PageRank problems in which the web link graph has a nested block structure. Based on the partition of the web pages, dangling nodes, common nodes, and general nodes, the hyperlink matrix can be reordered to be a more simple block structure. Then based on the parallel computation method, we propose an algorithm for the PageRank problems. In this algorithm, the dimension of the linear system becomes smaller, and the vector for general nodes in each block can be calculated separately in every iteration. Numerical experiments show that this approach speeds up the computation of PageRank.

  9. Het WEB leert begrijpen

    CERN Multimedia

    Stroeykens, Steven

    2004-01-01

    The WEB could be much more useful if the computers understood something of information on the Web pages. That explains the goal of the "semantic Web", a project in which takes part, amongst others, Tim Berners Lee, the inventor of the original WEB

  10. Instant responsive web design

    CERN Document Server

    Simmons, Cory

    2013-01-01

    A step-by-step tutorial approach which will teach the readers what responsive web design is and how it is used in designing a responsive web page.If you are a web-designer looking to expand your skill set by learning the quickly growing industry standard of responsive web design, this book is ideal for you. Knowledge of CSS is assumed.

  11. Web Document Classification Algorithm Based on Manifold Learning and SVM%基于流形学习和SVM的Web文档分类算法

    Institute of Scientific and Technical Information of China (English)

    王自强; 钱旭

    2009-01-01

    为解决Web文档分类问题,提出一种基于流形学习和SVM的Web文档分类算法.该算法利用流形学习算法LPP对训练集中的高维Web文档空间进行非线性降维,从中找出隐藏在高维观测数据中有意义的低维结构,在降维后的低维特征空间中利用乘性更新规则的优化SVM进行分类预测.实验结果表明该算法以较少的运行时间获得更高的分类准确率.%To efficiently resolve Web document classification problem, a novel Web document classification algorithm based on manifold learning and Support Vector Machine(SVM) is proposed. The high dimensional Web document space in the training sets are non-linearly reduced to lower dimensional space with manifold learning algorithm LPP, and the hidden interesting lower dimensional structure can be discovered from the high dimensional observisional data. The classification and predication in the lower dimensional feature space are implemented with the multiplicative update-based optimal SVM. Experimental results show that the algorithm achieves higher classification accuracy with less running time.

  12. Upgrade of CERN OP Webtools IRRAD Page

    CERN Document Server

    Vik, Magnus Bjerke

    2017-01-01

    CERN Beams Department maintains a website with various tools for the Operations Group, with one of them being specific for the Proton Irradiation Facility (IRRAD). The IRRAD team use the tool to follow up and optimize the operation of the facility. The original version of the tool was difficult to maintain and adding new features to the page was challenging. Thus this summer student project is aimed to upgrade the web page by rewriting the web page with maintainability and flexibility in mind. The new application uses a server--client architecture with a REST API on the back end which is used by the front end to request data for visualization. PHP is used on the back end to implement the API's and Swagger is used to document them. Vue, Semantic UI, Webpack, Node and ECMAScript 5 is used on the fronted to visualize and administrate the data. The result is a new IRRAD operations web application with extended functionality, improved structure and an improved user interface. It includes a new Status Panel page th...

  13. Web Personalization Using Web Mining

    Directory of Open Access Journals (Sweden)

    Ms.Kavita D.Satokar,

    2010-03-01

    Full Text Available The information on the web is growing dramatically. The users has to spend lots of time on the web finding the information they are interested in. Today, he traditional search engines do not give users enough personalized help but provide the user with lots of irrelevant information. In this paper, we present a personalize Web searchsystem, which can helps users to get the relevant web pages based on their selection from the domain list. Thus, users can obtain a set of interested domains and the web pages from the system. The system is based on features extracted from hyperlinks, such as anchor terms or URL tokens. Our methodology uses an innovative weighted URL Rank algorithm based on user interested domains and user query.

  14. An atlas of classification. Signage between open shelves, the Web and the catalogue

    Directory of Open Access Journals (Sweden)

    Andrea Fabbrizzi

    2014-07-01

    Full Text Available This paper intends to present the in-progress project for the signage system of the Dewey-classified shelves in the Library of Social Sciences at the University of Florence. To make the classified arrangement effective, a signage system must clarify complexity, that is it must orient users towards the logic behind the shelf arrangement, presenting in a visible and understandable way the entities and the relationships which appear in class indexing.This signage is based on cross-media communication and integrates the library's communication means at various levels, both in the context of the same medium and between different media: between the information signs on the end-caps of the shelves, between these information signs and the library website, between the library website and the catalogue. Mobile devices such as tablets and smartphones are particularly suitable for this integrated system, because they give the possibility to access the Web while moving from shelf to shelf. The QR codes allow a link between the Dewey-classified shelves and the catalogue directly from the information signs.

  15. An Efficient PageRank Approach for Urban Traffic Optimization

    OpenAIRE

    2012-01-01

    The cities are not static environments. They change constantly. When we talk about traffic in the city, the evolution of traffic lights is a journey from mindless automation to increasingly intelligent, fluid traffic management. In our approach, presented in this paper, reinforcement-learning mechanism based on cost function is introduced to determine optimal decisions for each traffic light, based on the solution given by Larry Page for page ranking in Web environment (Page et al. (1999))...

  16. 中美政府网站的网页文本对比及编译%A Model for Contrasting English Web Pages of China Government with Those of U.S. Government and its Application in the English Adaptation

    Institute of Scientific and Technical Information of China (English)

    冯琰

    2016-01-01

    Based on Werlich’s text grammar, analyzing the following corpora: the Official Web Portals for the Central People’s Government of the People’s Republic of China and the U.S. Government, this paper proposes a model for comparing those web page texts, including textual structure, textual contents and cultural norms. The model is applied into the English adaptation of China government web pages;new English adaptation for its home page has been provided for reference.%以中国中央人民政府英文网站和美国联邦政府网站的网页文本为研究语料,以德国学者Werlich的文本语法为理论框架,构建适用于政府网站编译的文本比较模式,并将其运用于中国政府网站英文版的编译,编译了其首页文本的新模板,既突出中国集体主义精神,又展示中国民众生活,同时,照顾目标读者的期待视野和阅读习惯。

  17. A link and Content Hybrid Approach for Arabic Web Spam Detection

    Directory of Open Access Journals (Sweden)

    Heider A. Wahsheh

    2012-12-01

    Full Text Available Some Web sites developers act as spammers and try to mislead the search engines by using illegal Search Engine Optimizations (SEO tips to increase the rank of their Web documents, to be more visible at the top 10 SERP. This is since gaining more visitors for marketing and commercial goals. This study is a continuation of a series of Arabic Web spam studies conducted by the authors, where this study is dedicated to build the first Arabic content/link Web spam detection system. This Novel system is capable to extract the set of content and link features of Web pages, in order to build the largest Arabic Web spam dataset. The constructed dataset contains three groups with the following three percentages of spam contents: 2%, 30%, and 40%. These three groups with varying percentages of spam contents were collected through the embedded crawler in the proposed system. The automated classification of spam Web pages used based on the features in the benchmark dataset. The proposed system used the rules of Decision Tree; which is considered as the best classifier to detect Arabic content/link Web spam. The proposed system helps to clean the SERP from all URLs referring to Arabic spam Web pages. It produces accuracy of 90.1099% for Arabic content-based, 93.1034% for Arabic link-based, and 89.011% in detecting both Arabic content and link Web spam, based on the collected dataset and conducted analysis.

  18. A Web-Based, Hospital-Wide Health Care-Associated Bloodstream Infection Surveillance and Classification System: Development and Evaluation.

    Science.gov (United States)

    Tseng, Yi-Ju; Wu, Jung-Hsuan; Lin, Hui-Chi; Chen, Ming-Yuan; Ping, Xiao-Ou; Sun, Chun-Chuan; Shang, Rung-Ji; Sheng, Wang-Huei; Chen, Yee-Chun; Lai, Feipei; Chang, Shan-Chwen

    2015-09-21

    Surveillance of health care-associated infections is an essential component of infection prevention programs, but conventional systems are labor intensive and performance dependent. To develop an automatic surveillance and classification system for health care-associated bloodstream infection (HABSI), and to evaluate its performance by comparing it with a conventional infection control personnel (ICP)-based surveillance system. We developed a Web-based system that was integrated into the medical information system of a 2200-bed teaching hospital in Taiwan. The system automatically detects and classifies HABSIs. In this study, the number of computer-detected HABSIs correlated closely with the number of HABSIs detected by ICP by department (n=20; r=.999 Psystem performed excellently with regard to sensitivity (98.16%), specificity (99.96%), positive predictive value (95.81%), and negative predictive value (99.98%). The system enabled decreasing the delay in confirmation of HABSI cases, on average, by 29 days. This system provides reliable and objective HABSI data for quality indicators, improving the delay caused by a conventional surveillance system.

  19. Research of web log statistical testing based on user classification%基于用户分类的web日志统计测试研究

    Institute of Scientific and Technical Information of China (English)

    俞金松; 高建华

    2012-01-01

    To solve the limitations of low efficiency and inflexible for traditional statistical testing, a method of Web log statistical testing based on user classification with the characteristic of Web applications is proposed. According to the difference complex of Web applications, the user groups are classified and the Web information is extracted from Web usage and failure logs. Using this information, models for Web log statistical testing is built and the reliability of Web application is estimated. An actual Web application is applied by this approach. The results demonstrate this approach can support higher testing coverage of main functions for Web application than the traditional statistical testing does, and the reliability assessment are more viable and realistic.%针对传统统计测试方法效率低和适应性不强的局限性,结合web应用的特点,提出了一种基于用户分类的web日志统计测试方法.根据web应用的不同复杂度,通过用户分类和web日志统计进行建模,并依据该模型测试,评估web应用的可靠性.实验结果表明,该方法较传统统计测试方法对于web应用主要业务功能测试的覆盖率更高,其可靠性评估更具现实意义.

  20. 新的PageRank优化算法%New PageRank optimization algorithm

    Institute of Scientific and Technical Information of China (English)

    蒋永辉; 吴洪丽

    2012-01-01

    Search engines repeatedly return currently popular pages at the top of search results, popular pages tend to get even more popular, while unpopular pages get ignored by an average user. In order to escape from this problem, an improved ranking function and effective Web user model are employed, and a New PageRank Optimization (NPRO) algorithm is provided. Experimental data show that the provided algorithm can attain unbiased Web ranking.%为了克服PageRank在搜索过程中重复性地把当前受欢迎的网页放在搜索结果的首要位置,而不受欢迎的网页被大多数用户忽略的问题,采用了一种改进的评估函数及有效的用户模型,获得了一个新的PageRank优化算法.实验结果表明,该算法达到了较好的公平性.