WorldWideScience

Sample records for preinterview web search

  1. Web Search Engines

    OpenAIRE

    Rajashekar, TB

    1998-01-01

    The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.

  2. Chemical Search Web Utility

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Chemical Search Web Utility is an intuitive web application that allows the public to easily find the chemical that they are interested in using, and which...

  3. Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien

    2013-01-01

    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in

  4. Distributed deep web search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien-Tsoi Theodorus Egbert

    2013-01-01

    The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in

  5. The Evolution of Web Searching.

    Science.gov (United States)

    Green, David

    2000-01-01

    Explores the interrelation between Web publishing and information retrieval technologies and lists new approaches to Web indexing and searching. Highlights include Web directories; search engines; portalisation; Internet service providers; browser providers; meta search engines; popularity based analysis; natural language searching; links-based…

  6. Supporting Web Search with Visualization

    Science.gov (United States)

    Hoeber, Orland; Yang, Xue Dong

    One of the fundamental goals of Web-based support systems is to promote and support human activities on the Web. The focus of this Chapter is on the specific activities associated with Web search, with special emphasis given to the use of visualization to enhance the cognitive abilities of Web searchers. An overview of information retrieval basics, along with a focus on Web search and the behaviour of Web searchers is provided. Information visualization is introduced as a means for supporting users as they perform their primary Web search tasks. Given the challenge of visualizing the primarily textual information present in Web search, a taxonomy of the information that is available to support these tasks is given. The specific challenges of representing search information are discussed, and a survey of the current state-of-the-art in visual Web search is introduced. This Chapter concludes with our vision for the future of Web search.

  7. Web Search Engines: Search Syntax and Features.

    Science.gov (United States)

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  8. Web Search Engines: Search Syntax and Features.

    Science.gov (United States)

    Ojala, Marydee

    2002-01-01

    Presents a chart that explains the search syntax, features, and commands used by the 12 most widely used general Web search engines. Discusses Web standardization, expanded types of content searched, size of databases, and search engines that include both simple and advanced versions. (LRW)

  9. Measuring Personalization of Web Search

    DEFF Research Database (Denmark)

    Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi

    2013-01-01

    Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...

  10. Location-based Web Search

    Science.gov (United States)

    Ahlers, Dirk; Boll, Susanne

    In recent years, the relation of Web information to a physical location has gained much attention. However, Web content today often carries only an implicit relation to a location. In this chapter, we present a novel location-based search engine that automatically derives spatial context from unstructured Web resources and allows for location-based search: our focused crawler applies heuristics to crawl and analyze Web pages that have a high probability of carrying a spatial relation to a certain region or place; the location extractor identifies the actual location information from the pages; our indexer assigns a geo-context to the pages and makes them available for a later spatial Web search. We illustrate the usage of our spatial Web search for location-based applications that provide information not only right-in-time but also right-on-the-spot.

  11. Web Search Studies: Multidisciplinary Perspectives on Web Search Engines

    Science.gov (United States)

    Zimmer, Michael

    Perhaps the most significant tool of our internet age is the web search engine, providing a powerful interface for accessing the vast amount of information available on the world wide web and beyond. While still in its infancy compared to the knowledge tools that precede it - such as the dictionary or encyclopedia - the impact of web search engines on society and culture has already received considerable attention from a variety of academic disciplines and perspectives. This article aims to organize a meta-discipline of “web search studies,” centered around a nucleus of major research on web search engines from five key perspectives: technical foundations and evaluations; transaction log analyses; user studies; political, ethical, and cultural critiques; and legal and policy analyses.

  12. Credibility in Web Search Engines

    OpenAIRE

    Lewandowski, Dirk

    2012-01-01

    Web search engines apply a variety of ranking signals to achieve user satisfaction, i.e., results pages that provide the best-possible results to the user. While these ranking signals implicitly consider credibility (e.g., by measuring popularity), explicit measures of credibility are not applied. In this chapter, credibility in Web search engines is discussed in a broad context: credibility as a measure for including documents in a search engine's index, credibility as a ranking signal, cred...

  13. Semantic Search of Web Services

    Science.gov (United States)

    Hao, Ke

    2013-01-01

    This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…

  14. Semantic Search of Web Services

    Science.gov (United States)

    Hao, Ke

    2013-01-01

    This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…

  15. Personalized Spiders for Web Search and Analysis.

    Science.gov (United States)

    Chau, Michael; Zeng, Daniel; Chen, Hsinchun

    Searching for useful information on the World Wide Web has become increasingly difficult. While Internet search engines have been helping people to search on the Web, low recall rate and outdated indexes have become more and more problematic as the Web grows. In addition, search tools usually present to the user only a list of search results,…

  16. Supporting reflective web searching in elementary schools

    NARCIS (Netherlands)

    Vries, de Bregje; Meij, van der Hans; Lazonder, Ard W.

    2008-01-01

    In this contribution, two design experiments are presented in which reflective web searching is implemented in six elementary classrooms. Reflective web searching is viewed to comprise three steps: (1) develop ownership over search questions, (2) interpret and personalize web content, and (3) adapt

  17. Optimization of web pages for search engines

    OpenAIRE

    Harej, Anže

    2011-01-01

    The thesis describes the most important elements of a Web Page and outside factors that affect Search Engine Optimization. The basic structure of a Web page, structure and functionality of a modern Search Engine is described at the beginning. The first section deals with the start of Search Engine Optimization, including planning, analysis of web space and the selection of the most important keywords for which the site will be optimized. The next section Web Page Optimization describes...

  18. A study of Web search trends

    Directory of Open Access Journals (Sweden)

    Bernard J. Jansen

    2004-12-01

    Full Text Available This article provides an overview of recent research conducted from 1997 to 2003 that explored how people search the Web. The article reports selected findings from many research studies conducted by the co-authors of the paper from 1997 to 2003 using large-scale Web query transaction logs provided by commercial Web companies, including Excite, Alta Vista, Ask Jeeves, and AlltheWeb.com. The many studies are also synthesized in the recent book "Web Search: Public Searching of the Web" by Amanda Spink and Bernard J. Jansen (Kluwer Academic Publishers. The researchers examined the topics of Web searches; how users search the Web using terms in queries during search sessions; and the diverse types of searches, including medical, sex, e-commerce, multimedia, etc. information. Key findings include changes in search topics since 1997, including a shift from entertainment to e-commerce queries. Further findings show little change in many aspects of Web searching from 1997-2003, including query and search session length. The studies also show more complex Web search behaviors by a minority of users who conduct multitasking and successive searches.

  19. Collection Selection for Distributed Web Search

    NARCIS (Netherlands)

    Bockting, S.

    2009-01-01

    Current popular web search engines, such as Google, Live Search and Yahoo!, rely on crawling to build an index of the World Wide Web. Crawling is a continuous process to keep the index fresh and generates an enormous amount of data traffic. By far the largest part of the web remains unindexed, becau

  20. Nuclear expert web search and crawler algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Reis, Thiago; Barroso, Antonio C.O.; Baptista, Benedito Filho D., E-mail: thiagoreis@usp.br, E-mail: barroso@ipen.br, E-mail: bdbfilho@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)

    2013-07-01

    In this paper we present preliminary research on web search and crawling algorithm applied specifically to nuclear-related web information. We designed a web-based nuclear-oriented expert system guided by a web crawler algorithm and a neural network able to search and retrieve nuclear-related hyper textual web information in autonomous and massive fashion. Preliminary experimental results shows a retrieval precision of 80% for web pages related to any nuclear theme and a retrieval precision of 72% for web pages related only to nuclear power theme. (author)

  1. Process-oriented semantic web search

    CERN Document Server

    Tran, DT

    2011-01-01

    The book is composed of two main parts. The first part is a general study of Semantic Web Search. The second part specifically focuses on the use of semantics throughout the search process, compiling a big picture of Process-oriented Semantic Web Search from different pieces of work that target specific aspects of the process.In particular, this book provides a rigorous account of the concepts and technologies proposed for searching resources and semantic data on the Semantic Web. To collate the various approaches and to better understand what the notion of Semantic Web Search entails, this bo

  2. Tales from the Field: Search Strategies Applied in Web Searching

    Directory of Open Access Journals (Sweden)

    Soohyung Joo

    2010-08-01

    Full Text Available In their web search processes users apply multiple types of search strategies, which consist of different search tactics. This paper identifies eight types of information search strategies with associated cases based on sequences of search tactics during the information search process. Thirty-one participants representing the general public were recruited for this study. Search logs and verbal protocols offered rich data for the identification of different types of search strategies. Based on the findings, the authors further discuss how to enhance web-based information retrieval (IR systems to support each type of search strategy.

  3. A neural click model for web search

    NARCIS (Netherlands)

    Borisov, A.; Markov, I.; de Rijke, M.; Serdyukov, P.

    2016-01-01

    Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented

  4. Sexual information seeking on web search engines.

    Science.gov (United States)

    Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles

    2004-02-01

    Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.

  5. A Survey on Semantic Web Search Engine

    Directory of Open Access Journals (Sweden)

    G.Sudeepthi

    2012-03-01

    Full Text Available The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now a days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. The Semantic Web is an extension of the current web in which information is given well-defined meaning. Semantic web technologies are playing a crucial role in enhancing traditional web search, as it is working to create machine readable data. but it will not replace traditional search engine. In this paper we made a brief survey on various promising features of some of the best semantic search engines developed so far and we have discussed the various approaches to semantic search. We have summarized the techniques, advantages of some important semantic web search engines that are developed so far.The most prominent part is that how the semantic search engines differ from the traditional searches and their results are shown by giving a sample query as input

  6. A Feedback-Based Web Search Engine

    Institute of Scientific and Technical Information of China (English)

    ZHANG Wei-feng; XU Bao-wen; ZHOU Xiao-yu

    2004-01-01

    Web search engines are very useful information service tools in the Internet.The current web search engines produce search results relating to the search terms and the actual information collected by them.Since the selections of the search results cannot affect the future ones, they may not cover most people's interests.In this paper, feedback information produced by the users' accessing lists will be represented by the rough set and can reconstruct the query string and influence the search results.And thus the search engines can provide self-adaptability.

  7. Identifying Aspects for Web-Search Queries

    OpenAIRE

    Wu, Fei; Madhavan, Jayant; Halevy, Alon

    2014-01-01

    Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search...

  8. Date restricted queries in web search engines

    OpenAIRE

    Lewandowski, Dirk

    2004-01-01

    Search engines usually offer a date restricted search on their advanced search pages. But determining the actual update of a web page is not without problems. We conduct a study testing date restricted queries on the search engines Google, Teoma and Yahoo!. We find that these searches fail to work properly in the examined engines. We discuss implications of this for further research and search engine development.

  9. Deep web search: an overview and roadmap

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd

    2011-01-01

    We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these

  10. Research Proposal for Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien

    2010-01-01

    This proposal identifies two main problems related to deep web search, and proposes a step by step solution for each of them. The first problem is about searching deep web content by means of a simple free-text interface (with just one input field, instead of a complex interface with many input

  11. Research Proposal for Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien-Tsoi T.E.

    2010-01-01

    This proposal identifies two main problems related to deep web search, and proposes a step by step solution for each of them. The first problem is about searching deep web content by means of a simple free-text interface (with just one input field, instead of a complex interface with many input fiel

  12. Deep web search: an overview and roadmap

    NARCIS (Netherlands)

    Tjin-Kam-Jet, K.; Trieschnigg, D.; Hiemstra, D.

    2011-01-01

    We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these decisio

  13. Research Proposal for Distributed Deep Web Search

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien

    2010-01-01

    This proposal identifies two main problems related to deep web search, and proposes a step by step solution for each of them. The first problem is about searching deep web content by means of a simple free-text interface (with just one input field, instead of a complex interface with many input fiel

  14. Deep web search: an overview and roadmap

    NARCIS (Netherlands)

    Tjin-Kam-Jet, Kien; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd

    2011-01-01

    We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these decisio

  15. Using Advanced Search Operators on Web Search Engines.

    Science.gov (United States)

    Jansen, Bernard J.

    Studies show that the majority of Web searchers enter extremely simple queries, so a reasonable system design approach would be to build search engines to compensate for this user characteristic. One hundred representative queries were selected from the transaction log of a major Web search service. These 100 queries were then modified using the…

  16. Using Advanced Search Operators on Web Search Engines.

    Science.gov (United States)

    Jansen, Bernard J.

    Studies show that the majority of Web searchers enter extremely simple queries, so a reasonable system design approach would be to build search engines to compensate for this user characteristic. One hundred representative queries were selected from the transaction log of a major Web search service. These 100 queries were then modified using the…

  17. A Novel Personalized Web Search Model

    Institute of Scientific and Technical Information of China (English)

    ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang

    2007-01-01

    A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.

  18. An intelligent method for geographic Web search

    Science.gov (United States)

    Mei, Kun; Yuan, Ying

    2008-10-01

    While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.

  19. A Survey on Web Search Results Personalization

    Directory of Open Access Journals (Sweden)

    Blessy Thomas

    2015-10-01

    Full Text Available  Web is a huge information repository covering almost every topic, in which a human user could be interested. As the size and richness of information on the web increases, diversity and complexity of the tasks users tries to perform also increases. With the overwhelming volume of information on the web, the task of finding relevant information related to a specific query or topic is becoming increasingly difficult. Now a day’s commonly used task on internet is web search. User gets variety of related information for their queries. To provide more relevant and effective results to user, Personalization technique is used. Personalized web search refer to search information that is tailored specifically to a person’s interests by incorporating information about query provided. Two general types of approaches to personalizing search results are modifying user’s query and re-ranking search results. Several personalized web search techniques based on web contents, web link structure, browsing history, user profiles and user queries. This paper is to represent survey on various techniques of personalization

  20. Web Search Results Summarization Using Similarity Assessment

    Directory of Open Access Journals (Sweden)

    Sawant V.V.

    2014-06-01

    Full Text Available Now day’s internet has become part of our life, the WWW is most important service of internet because it allows presenting information such as document, imaging etc. The WWW grows rapidly and caters to a diversified levels and categories of users. For user specified results web search results are extracted. Millions of information pouring online, users has no time to surf the contents completely .Moreover the information available is repeated or duplicated in nature. This issue has created the necessity to restructure the search results that could yield results summarized. The proposed approach comprises of different feature extraction of web pages. Web page visual similarity assessment has been employed to address the problems in different fields including phishing, web archiving, web search engine etc. In this approach, initially by enters user query the number of search results get stored. The Earth Mover's Distance is used to assessment of web page visual similarity, in this technique take the web page as a low resolution image, create signature of that web page image with color and co-ordinate features .Calculate the distance between web pages by applying EMD method. Compute the Layout Similarity value by using tag comparison algorithm and template comparison algorithm. Textual similarity is computed by using cosine similarity, and hyperlink analysis is performed to compute outward links. The final similarity value is calculated by fusion of layout, text, hyperlink and EMD value. Once the similarity matrix is found clustering is employed with the help of connected component. Finally group of similar web pages i.e. summarized results get displayed to user. Experiment conducted to demonstrate the effectiveness of four methods to generate summarized result on different web pages and user queries also.

  1. A grammar checker based on web searching

    Directory of Open Access Journals (Sweden)

    Joaquim Moré

    2006-05-01

    Full Text Available This paper presents an English grammar and style checker for non-native English speakers. The main characteristic of this checker is the use of an Internet search engine. As the number of web pages written in English is immense, the system hypothesises that a piece of text not found on the Web is probably badly written. The system also hypothesises that the Web will provide examples of how the content of the text segment can be expressed in a grammatically correct and idiomatic way. Thus, when the checker warns the user about the odd nature of a text segment, the Internet engine searches for contexts that can help the user decide whether he/she should correct the segment or not. By means of a search engine, the checker also suggests use of other expressions that appear on the Web more often than the expression he/she actually wrote.

  2. The Use of Web Search Engines in Information Science Research.

    Science.gov (United States)

    Bar-Ilan, Judit

    2004-01-01

    Reviews the literature on the use of Web search engines in information science research, including: ways users interact with Web search engines; social aspects of searching; structure and dynamic nature of the Web; link analysis; other bibliometric applications; characterizing information on the Web; search engine evaluation and improvement; and…

  3. Assessing Cognitive Load on Web Search Tasks

    CERN Document Server

    Gwizdka, Jacek

    2010-01-01

    Assessing cognitive load on web search is useful for characterizing search system features and search tasks with respect to their demands on the searcher's mental effort. It is also helpful for examining how individual differences among searchers (e.g. cognitive abilities) affect the search process. We examined cognitive load from the perspective of primary and secondary task performance. A controlled web search study was conducted with 48 participants. The primary task performance components were found to be significantly related to both the objective and the subjective task difficulty. However, the relationship between objective and subjective task difficulty and the secondary task performance measures was weaker than expected. The results indicate that the dual-task approach needs to be used with caution.

  4. Adding a visualization feature to web search engines: it's time.

    Science.gov (United States)

    Wong, Pak Chung

    2008-01-01

    It's widely recognized that all Web search engines today are almost identical in presentation layout and behavior. In fact, the same presentation approach has been applied to depicting search engine results pages (SERPs) since the first Web search engine launched in 1993. In this Visualization Viewpoints article, I propose to add a visualization feature to Web search engines and suggest that the new addition can improve search engines' performance and capabilities, which in turn lead to better Web search technology.

  5. Keyword search in the Deep Web

    OpenAIRE

    Calì, Andrea; Martinenghi, D.; Torlone, R.

    2015-01-01

    The Deep Web is constituted by data accessible through Web\\ud pages, but not readily indexable by search engines, as they are returned\\ud in dynamic pages. In this paper we propose a framework for accessing\\ud Deep Web sources, represented as relational tables with so-called ac-\\ud cess limitations, with keyword-based queries. We formalize the notion\\ud of optimal answer and investigate methods for query processing. To our\\ud knowledge, this problem has never been studied in a systematic way.

  6. Context representation for web search results

    OpenAIRE

    Vegas, Jesus; Crestani, Fabio; De La Fuente, Pablo

    2009-01-01

    Context has long been considered very useful to help the user assess the actual relevance of a document. In Web searching, context can help assess the relevance of a Web page by showing how the page is related to other pages in the same Web site, for example. Such information is very difficult to convey and visualise in a user friendly way. In this paper we present the design, implementation and evaluation of a graphical visualisation tool aimed at helping users to determine the relevance of ...

  7. The Anatomy of Mitos Web Search Engine

    CERN Document Server

    Papadakos, Panagiotis; Theoharis, Yannis; Armenatzoglou, Nikos; Kopidaki, Stella; Marketakis, Yannis; Daskalakis, Manos; Karamaroudis, Kostas; Linardakis, Giorgos; Makrydakis, Giannis; Papathanasiou, Vangelis; Sardis, Lefteris; Tsialiamanis, Petros; Troullinou, Georgia; Vandikas, Kostas; Velegrakis, Dimitris; Tzitzikas, Yannis

    2008-01-01

    Engineering a Web search engine offering effective and efficient information retrieval is a challenging task. This document presents our experiences from designing and developing a Web search engine offering a wide spectrum of functionalities and we report some interesting experimental results. A rather peculiar design choice of the engine is that its index is based on a DBMS, while some of the distinctive functionalities that are offered include advanced Greek language stemming, real time result clustering, and advanced link analysis techniques (also for spam page detection).

  8. Semantic Map Based Web Search Result Visualization

    OpenAIRE

    2007-01-01

    The problem of information overload has become more pressing with the emergence of the increasingly more popular Internet services. The main information retrieval mechanisms provided by the prevailing Internet Web software are based on either keyword search (e.g., Google and Yahoo) or hypertext browsing (e.g., Internet Explorer and Netscape). The research presented in this paper is aimed at providing an alternative concept-based categorization and search capability based on a combination of m...

  9. The BiSearch web server

    Directory of Open Access Journals (Sweden)

    Tusnády Gábor E

    2006-10-01

    Full Text Available Abstract Background A large number of PCR primer-design softwares are available online. However, only very few of them can be used for the design of primers to amplify bisulfite-treated DNA templates, necessary to determine genomic DNA methylation profiles. Indeed, the number of studies on bisulfite-treated templates exponentially increases as determining DNA methylation becomes more important in the diagnosis of cancers. Bisulfite-treated DNA is difficult to amplify since undesired PCR products are often amplified due to the increased sequence redundancy after the chemical conversion. In order to increase the efficiency of PCR primer-design, we have developed BiSearch web server, an online primer-design tool for both bisulfite-treated and native DNA templates. Results The web tool is composed of a primer-design and an electronic PCR (ePCR algorithm. The completely reformulated ePCR module detects potential mispriming sites as well as undesired PCR products on both cDNA and native or bisulfite-treated genomic DNA libraries. Due to the new algorithm of the current version, the ePCR module became approximately hundred times faster than the previous one and gave the best performance when compared to other web based tools. This high-speed ePCR analysis made possible the development of the new option of high-throughput primer screening. BiSearch web server can be used for academic researchers at the http://bisearch.enzim.hu site. Conclusion BiSearch web server is a useful tool for primer-design for any DNA template and especially for bisulfite-treated genomes. The ePCR tool for fast detection of mispriming sites and alternative PCR products in cDNA libraries and native or bisulfite-treated genomes are the unique features of the new version of BiSearch software.

  10. Distribution of Cognitive Load in Web Search

    CERN Document Server

    Gwizdka, Jacek

    2010-01-01

    The search task and the system both affect the demand on cognitive resources during information search. In some situations, the demands may become too high for a person. This article has a three-fold goal. First, it presents and critiques methods to measure cognitive load. Second, it explores the distribution of load across search task stages. Finally, it seeks to improve our understanding of factors affecting cognitive load levels in information search. To this end, a controlled Web search experiment with forty-eight participants was conducted. Interaction logs were used to segment search tasks semi-automatically into task stages. Cognitive load was assessed using a new variant of the dual-task method. Average cognitive load was found to vary by search task stages. It was significantly higher during query formulation and user description of a relevant document as compared to examining search results and viewing individual documents. Semantic information shown next to the search results lists in one of the st...

  11. Resource Selection for Federated Search on the Web

    OpenAIRE

    Nguyen, Dong Van; Demeester, Thomas; Trieschnigg, Dolf; Hiemstra, Djoerd

    2016-01-01

    A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search engines, ranging from large general web search engines such as Google, Bing and Yahoo to small do...

  12. Overview of the TREC 2013 Federated Web Search Track

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Dolf; Nguyen, Dong; Hiemstra, Djoerd

    2014-01-01

    The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb

  13. Overview of the TREC 2014 Federated Web Search Track

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Dolf; Nguyen, Dong-Phuong; Zhou, Ke; Hiemstra, Djoerd

    2014-01-01

    The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in

  14. Overview of the TREC 2013 Federated Web Search Track

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb

  15. Weighting Relations Using Web Search Engine

    Science.gov (United States)

    Oka, Mizuki; Matsuo, Yutaka

    Measuring the weight of the relation between a pair of entities is necessary to use social networks for various purposes. Intuitively, a pair of entities has a stronger relation than another. It should therefore be weighted higher. We propose a method, using a Web search engine, to compute the weight of the relation existing between a pair of entities. Our method receives a pair of entities and various relations that exist between entities as input. It then outputs the weighted value for the pair of entities. The method explores how search engine results can be used as evidence for how strongly the two entities pertain to the relation.

  16. Predicting consumer behavior with Web search.

    Science.gov (United States)

    Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J

    2010-10-12

    Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.

  17. Overview of the TREC 2014 Federated Web Search Track

    OpenAIRE

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Zhou, Ke; Hiemstra, Djoerd

    2014-01-01

    The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in FedWeb 2014, and we additionally introduced the task of vertical selection. Other new aspects are the required link between the Resource Selection and Results Merging, and the importance of diversi...

  18. FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    2015-01-01

    This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as we

  19. FedWeb greatest hits: presenting the new test collection for federated web search

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Dolf; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    2015-01-01

    This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as we

  20. FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search

    NARCIS (Netherlands)

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as

  1. Overview of the TREC 2013 Federated Web Search Track

    OpenAIRE

    Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Hiemstra, Djoerd

    2014-01-01

    The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb 2013. The focus was on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the relevance judgments for the ...

  2. Comparison of Three Web Search Algorithms

    Institute of Scientific and Technical Information of China (English)

    Ying Bao; Zi-hu Zhu

    2006-01-01

    In this paper we discuss three important kinds of Markov chains used in Web search algorithms-the maximal irreducible Markov chain, the minimal irreducible Markov chain and the middle irreducible Markov chain. We discuss the stationary distributions, the convergence rates and the Maclaurin series of the stationary distributions of the three kinds of Markov chains. Among other things, our results show that the maximal and minimal Markov chains have the same stationary distribution and that the stationary distribution of the middle Markov chain reflects the real Web structure more objectively. Our results also prove that the maximal and middle Markov chains have the same convergence rate and that the maximal Markov chain converges faster than the minimal Markov chain when the damping factor α> 1/(√2).

  3. Efficient Approach for Semantic Web Searching Using Markov Model

    Directory of Open Access Journals (Sweden)

    Pradeep Salve

    2012-09-01

    Full Text Available The semantic search usually the web pages for the required information and filter the pages from semantic web searching unnecessary pages by using advanced algorithms. Web pages are vulnerable in answering intelligent semantic search from the user due to the confidence of their consequences on information obtainable in web pages. To get the trusted results semantic web search engines require searching for pages that maintain such information at some place including domain knowledge. The layered model of Semantic Web provides solution to this problem by providing semantic web search based on HMM for optimization of search engines tasks, specialty focusing on how to construct a new model structure to improve the extraction of web pages. We classify the search results using some search engines and some different search keywords provide a significant improvement in search accuracy. Semantic web is segmented from the elicited information of various websites based on their characteristic of semi-structure in order to improve the accuracy and efficiency of the transition matrix. Also, it optimizes the observation probability distribution and the estimation accuracy of state transition sequence by adopting the “voting strategy” and alter Viterbi algorithm. In this paper, we have presented a hybrid system that includes both hidden Markov models and rich markov model that showed the effectiveness of combining implicit search with rich Markov models for a recommender system.

  4. Intelligent Semantic Web Search Engines: A Brief Survey

    CERN Document Server

    Madhu, G; Rajinikanth, Dr T V

    2011-01-01

    The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies.

  5. How Google Web Search copes with very similar documents

    NARCIS (Netherlands)

    Mettrop, W.; Nieuwenhuysen, P.; Smulders, H.

    2006-01-01

    A significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that web search resul

  6. Resource Selection for Federated Search on the Web

    NARCIS (Netherlands)

    Nguyen, Dong; Demeester, Thomas; Trieschnigg, Dolf; Hiemstra, Djoerd

    2016-01-01

    A publicly available dataset for federated search reflecting a real web environment has long been bsent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web usi

  7. Resource Selection for Federated Search on the Web

    NARCIS (Netherlands)

    Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd

    A publicly available dataset for federated search reflecting a real web environment has long been bsent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web

  8. Searching heterogeneous collections on the Web: behaviour of Excite users

    Directory of Open Access Journals (Sweden)

    Amanda Spink

    1998-01-01

    Full Text Available As Web search services become a major source of information for a growing number of people, we need to know more about how users search heterogeneous collections using Web search engines. This paper reports the results from a major study exploring users' information searching behaviour on the EXCITE Web search engine. Three hundred and fifty-seven (357 EXCITE users responded to an interactive survey, including their search topics, intended query terms, search frequency for information on their topic, and demographic data. Results show that: users tend to employ simple search strategies, and conduct successive searches over time to find information related to a particular topic. Implications for the design of Web search services are discussed.

  9. Adding to the Students' Toolbox: Using Directories, Search Engines, and the Hidden Web in Search Processes.

    Science.gov (United States)

    Mardis, Marcia A.

    2002-01-01

    Discussion of searching for information on the Web focuses on resources that are not always found by traditional Web searches. Describes sources on the hidden Web, including full-text databases, clearinghouses, digital libraries, and learning objects; explains how search engines operate; and suggests that traditional print sources are still…

  10. Design and Implementation of a Simple Web Search Engine

    CERN Document Server

    Mirzal, Andri

    2011-01-01

    We present a simple web search engine for indexing and searching html documents using python programming language. Because python is well known for its simple syntax and strong support for main operating systems, we hope it will be beneficial for learning information retrieval techniques, especially web search engine technology.

  11. A Cooperative Schema between Web Sever and Search Engine for Improving Freshness of Web Repository

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Because the web is huge and web pages are updated frequently, the index maintained by a search engine has to refresh web pages periodically. This is extremely resource consuming because the search engine needs to crawl the web and download web pages to refresh its index. Based on present technologies of web refreshing, we present a cooperative schema between web server and search engine for maintaining freshness of web repository. The web server provides meta-data defined through XML standard to describe web sites. Before updating the web page the crawler visits the meta-data files. If the meta-data indicates that the page is not modified, then the crawler will not update it. So this schema can save bandwidth resource. A primitive model based on the schema is implemented. The cost and efficiency of the schema are analyzed.

  12. A Deep Web Data Integration System for Job Search

    Institute of Scientific and Technical Information of China (English)

    LIU Wei; LI Xian; LING Yanyan; ZHANG Xiaoyu; MENG Xiaofeng

    2006-01-01

    With the rapid development of Web, there are more and more Web databases available for users to access. At the same time, job searchers often have difficulties in first finding the right sources and then querying over them, providing such an integrated job search system over Web databases has become a Web application in high demand. Based on such consideration, we build a deep Web data integration system that supports unified access for users to multiple job Web sites as a job meta-search engine. In this paper, the architecture of the system is given first, and the key components in the system are introduced.

  13. An Introduction to Search Engines and Web Navigation

    CERN Document Server

    Levene, Mark

    2010-01-01

    This book is a second edition, updated and expanded to explain the technologies that help us find information on the webSearch engines and web navigation tools have become ubiquitous in our day to day use of the web as an information source, a tool for commercial transactions and a social computing tool. Moreover, through the mobile web we have access to the web's services when we are on the move.  This book demystifies the tools that we use when interacting with the web, and gives the reader a detailed overview of where we are and where we are going in terms of search engine

  14. IMPROVING PERSONALIZED WEB SEARCH USING BOOKSHELF DATA STRUCTURE

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2012-10-01

    Full Text Available Search engines are playing a vital role in retrieving relevant information for the web user. In this research work a user profile based web search is proposed. So the web user from different domain may receive different set of results. The main challenging work is to provide relevant results at the right level of reading difficulty. Estimating user expertise and re-ranking the results are the main aspects of this paper. The retrieved results are arranged in Bookshelf Data Structure for easy access. Better presentation of search results hence increases the usability of web search engines significantly in visual mode.

  15. Needle custom search recall-oriented search on the web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, R.; Koot, G.; Huis in 't Veld, M.A.A.; Broek, E.L. van den

    2014-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o

  16. Needle custom search: recall-oriented search on the Web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; Broek, van den Egon L.; Rijke, de Maarten; Kenter, Tom; Vries, de Arjen P.; Zhai, Chen Xiang; Jong, de Franciska; Radinsky, Kira; Hofmann, Katja

    2014-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o

  17. Needle Custom Search: Recall-oriented search on the Web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon; de Rijke, Maarten; Kenter, Tom; de Vries, A.P.; Zhai, Chen Xiang; de Jong, Franciska M.G.; Radinsky, Kira; Hofmann, Katja

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency

  18. Needle Custom Search : Recall-oriented search on the web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon L.

    2014-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency

  19. Needle Custom Search : Recall-oriented search on the web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon L.

    2014-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o

  20. Needle custom search recall-oriented search on the web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, R.; Koot, G.; Huis in 't Veld, M.A.A.; Broek, E.L. van den

    2014-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency o

  1. AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

    Directory of Open Access Journals (Sweden)

    Cezar VASILESCU

    2010-01-01

    Full Text Available The Internet becomes for most of us a daily used instrument, for professional or personal reasons. We even do not remember the times when a computer and a broadband connection were luxury items. More and more people are relying on the complicated web network to find the needed information.This paper presents an overview of Internet search related issues, upon search engines and describes the parties and the basic mechanism that is embedded in a search for web based information resources. Also presents ways to increase the efficiency of web searches, through a better understanding of what search engines ignore at websites content.

  2. Web search: how the Web has changed information retrieval

    Directory of Open Access Journals (Sweden)

    Brooks Terrence A.

    2003-01-01

    Full Text Available Topical metadata are simultaneously hailed as building blocks of the semantic Web and derogated as spam. The significance of the metadata controversy depends on the technological appropriateness of adding them to Web pages. A survey of Web technology suggests that Web pages are both transient and volatile: poor hosts of topical metadata. A more supportive environment exists in the closed Web. The vast majority of Web pages, however, exist in the open Web, an environment that challenges the application of legacy information retrieval concepts and methods.

  3. Efficient Diversification of Web Search Results

    CERN Document Server

    Capannini, Gabriele; Perego, Raffaele; Silvestri, Fabrizio

    2011-01-01

    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track test...

  4. Text Retrieval Online: Historical Perspective on Web Search Engines.

    Science.gov (United States)

    Hahn, Trudi Bellardo

    1998-01-01

    Provides an overview of online systems and search engines, highlighting search (relationships between terms and interpretation of words), browse, and Web search engine capabilities, iterative searches, canned or stored queries, vocabulary browsing, delivery of full source documents, simple and advanced user interfaces, and global access. Notes…

  5. Getting Off the Beaten Track: Specialized Web Search Engines.

    Science.gov (United States)

    Sullivan, Danny

    1998-01-01

    Describes specialty or vertical Web search engines that may provide more relevant results for information retrieval. Highlights include regional services, including filtering by domain and custom crawling; language searching; family-safe listings, including the pros and cons of filtering; news searches; and subject-oriented searching. (LRW)

  6. Uncovering Web search strategies in South African higher education

    Directory of Open Access Journals (Sweden)

    Surika Civilcharran

    2016-04-01

    Full Text Available Background: In spite of the enormous amount of information available on the Web and the fact that search engines are continuously evolving to enhance the search experience, students are nevertheless faced with the difficulty of effectively retrieving information. It is, therefore, imperative for the interaction between students and search tools to be understood and search strategies to be identified, in order to promote successful information retrieval.Objectives: This study identifies the Web search strategies used by postgraduate students and forms part of a wider study into information retrieval strategies used by postgraduate students at the University of KwaZulu-Natal (UKZN, Pietermaritzburg campus, South Africa. Method: Largely underpinned by Thatcher’s cognitive search strategies, the mixed-methods approach was utilised for this study, in which questionnaires were employed in Phase 1 and structured interviews in Phase 2. This article reports and reflects on the findings of Phase 2, which focus on identifying the Web search strategies employed by postgraduate students. The Phase 1 results were reported in Civilcharran, Hughes and Maharaj (2015.Results: Findings reveal the Web search strategies used for academic information retrieval. In spite of easy access to the invisible Web and the advent of meta-search engines, the use of Web search engines still remains the preferred search tool. The UKZN online library databases and especially the UKZN online library, Online Public Access Catalogue system, are being underutilised.Conclusion: Being ranked in the top three percent of the world’s universities, UKZN is investing in search tools that are not being used to their full potential. This evidence suggests an urgent need for students to be trained in Web searching and to have a greater exposure to a variety of search tools. This article is intended to further contribute to the design of undergraduate training programmes in order to deal

  7. Social Search: A Taxonomy of, and a User-Centred Approach to, Social Web Search

    Science.gov (United States)

    McDonnell, Michael; Shiri, Ali

    2011-01-01

    Purpose: The purpose of this paper is to introduce the notion of social search as a new concept, drawing upon the patterns of web search behaviour. It aims to: define social search; present a taxonomy of social search; and propose a user-centred social search method. Design/methodology/approach: A mixed method approach was adopted to investigate…

  8. A Domain Specific Ontology Based Semantic Web Search Engine

    CERN Document Server

    Mukhopadhyay, Debajyoti; Mukherjee, Sreemoyee; Bhattacharya, Jhilik; Kim, Young-Chon

    2011-01-01

    Since its emergence in the 1990s the World Wide Web (WWW) has rapidly evolved into a huge mine of global information and it is growing in size everyday. The presence of huge amount of resources on the Web thus poses a serious problem of accurate search. This is mainly because today's Web is a human-readable Web where information cannot be easily processed by machine. Highly sophisticated, efficient keyword based search engines that have evolved today have not been able to bridge this gap. So comes up the concept of the Semantic Web which is envisioned by Tim Berners-Lee as the Web of machine interpretable information to make a machine processable form for expressing information. Based on the semantic Web technologies we present in this paper the design methodology and development of a semantic Web search engine which provides exact search results for a domain specific search. This search engine is developed for an agricultural Website which hosts agricultural information about the state of West Bengal.

  9. Searching the Online catalog and the World Wide Web

    Directory of Open Access Journals (Sweden)

    Shu-Hsien L. Chen

    2003-09-01

    Full Text Available The article discusses the searching behaviors of school children using the online catalog and the World Wide Web. The amount of information and search capability for the online catalog and the World Wide Web, though, differ to a great extent, students share several common problems in using them. They have problems in spelling and typing, phrasing of search terms, extracting key concept, formulating search strategy, and evaluating search results. Their specific problems of searching the World Wide Web include rapid navigation of the Internet, overuse of Back button and browsing strategy, and evaluating only the first screen. Teachers and media specialists need to address these problems in the instruction of information literacy skills so that students can fully utilize the power of online searching and become efficient information searchers.

  10. World Wide Web Search Engines: AltaVista and Yahoo.

    Science.gov (United States)

    Machovec, George S., Ed.

    1996-01-01

    Examines the history, structure, and search capabilities of Internet search tools AltaVista and Yahoo. AltaVista provides relevance-ranked feedback on full-text searches. Yahoo indexes Web "citations" only but does organize information hierarchically into predefined categories. Yahoo has recently become a publicly held company and…

  11. Similarities and differences between Web search procedure and searching in the pre-web information retrieval systems

    Directory of Open Access Journals (Sweden)

    Yazdan Mansourian

    2004-08-01

    Full Text Available This paper presents an introductory discussion about the commonalities and dissimilarities between Web searching procedure and the searching process in the previous online information retrieval systems including classic information retrieval systems and database. The paper attempts to explain which factors make these two groups different, why investigating about the search process on the Web environment is important, how much we know about this procedure and what are the main lines of research in front of the researchers in this area of study and practice. After presenting the major involved factor the paper concludes that although information seeking process on the Web is fairly similar to the pre-web systems in some ways, there are notable differences between them as well. These differences may provide Web searcher and Web researchers with some opportunities and challenges.

  12. Web Service Architecture for a Meta Search Engine

    Directory of Open Access Journals (Sweden)

    K.Srinivas

    2011-10-01

    Full Text Available With the rapid advancements in Information Technology, Information Retrieval on Internet is gaining its importance day by day. Nowadays there are millions of Websites and billions of homepages available on the Internet. Search Engines are the essential tools for the purpose of retrieving the required information from the Web. But the existing search engines have many problems such as not having wide scope, imbalance in accessing the sites etc. So, the effectiveness of a search engine plays a vital role. Meta search engines are such systems that can provide effective information by accessing multiple existing search engines such as Dog Pile, Meta Crawler etc, but most of them cannot successfully operate on heterogeneous and fully dynamic web environment. In this paper we propose a Web Service Architecture for Meta Search Engine to cater the need of heterogeneous and dynamic web environment. The objective of our proposal is to exploit most of the features offered by Web Services through the implementation of a Web Service Meta Search Engine.

  13. [The search for medical information on the World Wide Web].

    Science.gov (United States)

    Cappeliez, O; Ranschaert, E; Peetrons, P; Struyven, J

    1999-12-01

    The internet has experienced tremendous growth over the past few years and has currently many resources in the field of medicine. However, many physicians remain unaware of how to gain access to this powerful tool. This article briefly describes the World Wide Web and its potential applications for physicians. The basics of web search engines and medical directories, as well as the use of advanced search with boolean operators are explained.

  14. Evaluation of web search for the information practitioner

    OpenAIRE

    2007-01-01

    Purpose – The aim of the paper is to put forward a structured mechanism for web search evaluation. The paper seeks to point to useful scientific research and show how information practitioners can use these methods in evaluation of search on the web for their users. Design/methodology/approach – The paper puts forward an approach which utilizes traditional laboratory‐based evaluation measures such as average precision/precision at N documents, augmented with diagnostic measures such...

  15. Designing Search: Effective Search Interfaces for Academic Library Web Sites

    Science.gov (United States)

    Teague-Rector, Susan; Ghaphery, Jimmy

    2008-01-01

    Academic libraries customize, support, and provide access to myriad information systems, each with complex graphical user interfaces. The number of possible information entry points on an academic library Web site is both daunting to the end-user and consistently challenging to library Web site designers. Faced with the challenges inherent in…

  16. Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm

    Directory of Open Access Journals (Sweden)

    M. Manikantan

    2015-02-01

    Full Text Available Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining process and a common technique for grouping the web search results into a different category based on the specific web contents. A clustering search engine called Lingo used only snippets to cluster the documents. Though this method takes less time to cluster the documents, it could not be able to produce the clusters of good quality. This study focuses on clustering all documents using by applying semantic similarity between words and then by applying modified lingo algorithm in less time and produce good quality.

  17. Two Selfless Contributions to Web Search Evaluation

    Science.gov (United States)

    2014-11-01

    participate in the Web Track and the Federated Web Track. Our experiments are run by MIREX [2]1 ( MapReduce Information Retrieval Experiments), a library of... MapReduce programs to ex- tract data and sequentially scan document representations. Built on Hadoop, sequential scanning becomes a viable approach...Hauff. Mapreduce for information retrieval evaluation: ”let’s quickly test this on 12 TB of data”. In Multilingual and Multimodal Information Access

  18. Recall Oriented Search on the Web using Semantic Annotations

    NARCIS (Netherlands)

    Kaptein, A.M.; Broek, E.L. van den; Koot, G.; Huis in't Veld, M.A.A.

    2013-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic investi

  19. What Snippets Say About Pages in Federated Web Search

    NARCIS (Netherlands)

    Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Dolf; Develder, Chris; Hiemstra, Djoerd; Hou, Yuexian; Nie, Jian-Yun; Sun, Le; Wang, Bo; Zhang, Peng

    2012-01-01

    What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research qu

  20. World Wide Web Metaphors for Search Mission Data

    Science.gov (United States)

    Norris, Jeffrey S.; Wallick, Michael N.; Joswig, Joseph C.; Powell, Mark W.; Torres, Recaredo J.; Mittman, David S.; Abramyan, Lucy; Crockett, Thomas M.; Shams, Khawaja S.; Fox, Jason M.; hide

    2010-01-01

    A software program that searches and browses mission data emulates a Web browser, containing standard meta - phors for Web browsing. By taking advantage of back-end URLs, users may save and share search states. Also, since a Web interface is familiar to users, training time is reduced. Familiar back and forward buttons move through a local search history. A refresh/reload button regenerates a query, and loads in any new data. URLs can be constructed to save search results. Adding context to the current search is also handled through a familiar Web metaphor. The query is constructed by clicking on hyperlinks that represent new components to the search query. The selection of a link appears to the user as a page change; the choice of links changes to represent the updated search and the results are filtered by the new criteria. Selecting a navigation link changes the current query and also the URL that is associated with it. The back button can be used to return to the previous search state. This software is part of the MSLICE release, which was written in Java. It will run on any current Windows, Macintosh, or Linux system.

  1. Recall oriented search on the web using semantic annotations

    NARCIS (Netherlands)

    Kaptein, Rianne; van den Broek, Egon; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; Bennett, P.N.; Gabrilovich, E.; Kamps, J.; Karlgren, J.

    2013-01-01

    Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic

  2. Minimalist Instruction for Learning to Search the World Wide Web

    NARCIS (Netherlands)

    Lazonder, A.W.

    2001-01-01

    This study examined the efficacy of minimalist instruction to develop self-regulatory skills involved in Web searching. Two versions of minimalist self-regulatory skill instruction were compared to a control group that was merely taught procedural skills to operate the search engine. Acquired skills

  3. Ontology-Based Information Behaviour to Improve Web Search

    Directory of Open Access Journals (Sweden)

    Silvia Calegari

    2010-10-01

    Full Text Available Web Search Engines provide a huge number of answers in response to a user query, many of which are not relevant, whereas some of the most relevant ones may not be found. In the literature several approaches have been proposed in order to help a user to find the information relevant to his/her real needs on the Web. To achieve this goal the individual Information Behavior can been analyzed to ’keep’ track of the user’s interests. Keeping information is a type of Information Behavior, and in several works researchers have referred to it as the study on what people do during a search on the Web. Generally, the user’s actions (e.g., how the user moves from one Web page to another, or her/his download of a document, etc. are recorded in Web logs. This paper reports on research activities which aim to exploit the information extracted from Web logs (or query logs in personalized user ontologies, with the objective to support the user in the process of discovering Web information relevant to her/his information needs. Personalized ontologies are used to improve the quality of Web search by applying two main techniques: query reformulation and re-ranking of query evaluation results. In this paper we analyze various methodologies presented in the literature aimed at using personalized ontologies, defined on the basis of the observation of Information Behaviour to help the user in finding relevant information.

  4. The intelligent web search, smart algorithms, and big data

    CERN Document Server

    Shroff, Gautam

    2013-01-01

    As we use the Web for social networking, shopping, and news, we leave a personal trail. These days, linger over a Web page selling lamps, and they will turn up at the advertising margins as you move around the Internet, reminding you, tempting you to make that purchase. Search engines such as Google can now look deep into the data on the Web to pull out instances of the words you are looking for. And there are pages that collect and assess information to give you a snapshot ofchanging political opinion. These are just basic examples of the growth of ""Web intelligence"", as increasingly sophis

  5. Web Image Retrieval Search Engine based on Semantically Shared Annotation

    Directory of Open Access Journals (Sweden)

    Alaa Riad

    2012-03-01

    Full Text Available This paper presents a new majority voting technique that combines the two basic modalities of Web images textual and visual features of image in a re-annotation and search based framework. The proposed framework considers each web page as a voter to vote the relatedness of keyword to the web image, the proposed approach is not only pure combination between image low level feature and textual feature but it take into consideration the semantic meaning of each keyword that expected to enhance the retrieval accuracy. The proposed approach is not used only to enhance the retrieval accuracy of web images; but also able to annotated the unlabeled images.

  6. GoWeb: a semantic search engine for the life science web.

    Science.gov (United States)

    Dietze, Heiko; Schroeder, Michael

    2009-10-01

    Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%. GoWeb's combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: http://www.gopubmed.org/goweb.

  7. Improving Web searches: case study of quit-smoking Web sites for teenagers.

    Science.gov (United States)

    Koo, Malcolm; Skinner, Harvey

    2003-11-14

    The Web has become an important and influential source of health information. With the vast number of Web sites on the Internet, users often resort to popular search sites when searching for information. However, little is known about the characteristics of Web sites returned by simple Web searches for information about smoking cessation for teenagers. To determine the characteristics of Web sites retrieved by search engines about smoking cessation for teenagers and how information quality correlates with the search ranking. The top 30 sites returned by 4 popular search sites in response to the search terms "teen quit smoking" were examined. The information relevance and quality characteristics of these sites were evaluated by 2 raters. Objective site characteristics were obtained using a page-analysis Web site. Only 14 of the 30 Web sites are of direct relevance to smoking cessation for teenagers. The readability of about two-thirds of the 14 sites is below an eighth-grade school level and they ranked significantly higher (Kendall rank correlation, tau = -0.39, P =.05) in search-site results than sites with readability above or equal to that grade level. Sites that ranked higher were significantly associated with the presence of e-mail address for contact (tau = -0.46, P =.01), annotated hyperlinks to external sites (tau = -0.39, P =.04), and the presence of meta description tag (tau = -0.48, P =.002). The median link density (number of external sites that have a link to that site) of the Web pages was 6 and the maximum was 735. A higher link density was significantly associated with a higher rank (tau = -0.58, P =.02). Using simple search terms on popular search sites to look for information on smoking cessation for teenagers resulted in less than half of the sites being of direct relevance. To improve search efficiency, users could supplement results obtained from simple Web searches with human-maintained Web directories and learn to refine their searches with

  8. Enhancing Web Search with Semantic Identification of User Preferences

    Directory of Open Access Journals (Sweden)

    Naglaa Fathy

    2011-11-01

    Full Text Available Personalized web search is able to satisfy individuals information needs by modeling long-term and short-term user interests based on user actions, browsed documents or past queries and incorporate these in the search process. In this paper, we propose a personalized search approach which models the user search preferences in an ontological user profile and semantically compares this model against user current query context to re-rank search results. Our user profile is based on the predefined ontology Open Directory Project (ODP so that after a user's search, relevant web pages are classified into topics in the ontology using semantic and cosine similarity measures. Moreover, interest scores are assigned to topics based on the users ongoing behavior. Our experiments show that re-ranking based on the semantic evidence of the updated user profile efficiently satisfies user information needs with the most relevant results being brought on to the top of the returned results.

  9. Uncovering Web search tactics in South African higher education

    Directory of Open Access Journals (Sweden)

    Surika Civilcharran

    2015-02-01

    Full Text Available Background: The potential of the World Wide Web (‘the Web’ as a tool for information retrieval in higher education is beyond question. Harnessing this potential, however, remains a challenge, particularly in the context of developing countries, where students are drawn from diverse socio-economic, educational and technological backgrounds. Objectives: The purpose of this study is to identify the Web search tactics used by postgraduate students in order to address the weaknesses of undergraduate students with regard to their Web searching tactics. This article forms part of a wider study into postgraduate students’ information retrieval strategies at the University of KwaZulu-Natal, Pietermaritzburg campus, South Africa. Method: The study utilised the mixed methods approach, employing both questionnaires (Phase 1 and structured interviews (Phase 2, and was largely underpinned by Bates’s model of information search tactics. This article reports and reflects on the findings of Phase 1, which focused on identifying the Web search tactics employed by postgraduate students. Results: Findings indicated a preference for lower-level Web search tactics, despite respondents largely self-reporting as intermediate or expert users. Moreover, the majority of respondents gained their knowledge on Web searching through experience and only a quarter of respondents have been given formal training on Web searching. Conclusion: In addition to contributing to theory, it is envisaged that this article will contribute to practice by informing the design of undergraduate training interventions to proactively address the information retrieval challenges faced by novice users. Subsequent papers will report on Phase 2 of the study.

  10. The effect of query complexity on Web searching results

    Directory of Open Access Journals (Sweden)

    B.J. Jansen

    2000-01-01

    Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.

  11. Review of Metadata Elements within the Web Pages Resulting from Searching in General Search Engines

    Directory of Open Access Journals (Sweden)

    Sima Shafi’ie Alavijeh

    2009-12-01

    Full Text Available The present investigation was aimed to study the scope of presence of Dublin Core metadata elements and HTML meta tags in web pages. Ninety web pages were chosen by searching general search engines (Google, Yahoo and MSN. The scope of metadata elements (Dublin Core and HTML Meta tags present in these pages as well as existence of a significant correlation between presence of meta elements and type of search engines were investigated. Findings indicated very low presence of both Dublin Core metadata elements and HTML meta tags in the pages retrieved which in turn illustrates the very low usage of meta data elements in web pages. Furthermore, findings indicated that there are no significant correlation between the type of search engine used and presence of metadata elements. From the standpoint of including metadata in retrieval of web sources, search engines do not significantly differ from one another.

  12. Improving Web Search for Difficult Queries

    Science.gov (United States)

    Wang, Xuanhui

    2009-01-01

    Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…

  13. Generating personalized web search using semantic context.

    Science.gov (United States)

    Xu, Zheng; Chen, Hai-Yan; Yu, Jie

    2015-01-01

    The "one size fits the all" criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  14. Equipped Search Results Using Machine Learning from Web Databases

    Directory of Open Access Journals (Sweden)

    Ahmed Mudassar Ali

    2015-05-01

    Full Text Available Aim of this study is to form a cluster of search results based on similarity and to assign meaningful label to it Database driven web pages play a vital role in multiple domains like online shopping, e-education systems, cloud computing and other. Such databases are accessible through HTML forms and user interfaces. They return the result pages come from the underlying databases as per the nature of the user query. Such types of databases are termed as Web Databases (WDB. Web databases have been frequently employed to search the products online for retail industry. They can be private to a retailer/concern or publicly used by a number of retailers. Whenever the user queries these databases using keywords, most of the times the user will be deviated by the search results returned. The reason is no relevance exists between the keyword and SRs (Search Results. A typical web page returned from a WDB has multiple Search Result Records (SRRs. An easier way is to group the similar SRRs into one cluster in such a way the user can be more focused on his demand. The key concept of this paper is XML technologies. In this study, we propose a novel system called CSR (Clustering Search Results which extracts the data from the XML database and clusters them based on the similarity and finally assigns meaningful label for it. So, the output of the keyword entered will be the clusters containing related data items.

  15. Do two heads search better than one? Effects of student collaboration on web search behavior and search outcomes.

    NARCIS (Netherlands)

    Lazonder, Adrianus W.

    2005-01-01

    This study compared Pairs of students with Single students in web search tasks. The underlying hypothesis was that peer-to-peer collaboration encourages students to articulate their thoughts, which in turn has a facilitative effect on the regulation of the search process as well as search outcomes.

  16. Semantic Web Based Efficient Search Using Ontology and Mathematical Model

    Directory of Open Access Journals (Sweden)

    K.Palaniammal

    2014-01-01

    Full Text Available The semantic web is the forthcoming technology in the world of search engine. It becomes mainly focused towards the search which is more meaningful rather than the syntactic search prevailing now. This proposed work concerns about the semantic search with respect to the educational domain. In this paper, we propose semantic web based efficient search using ontology and mathematical model that takes into account the misleading, unmatched kind of service information, lack of relevant domain knowledge and the wrong service queries. To solve these issues in this framework is designed to make three major contributions, which are ontology knowledge base, Natural Language Processing (NLP techniques and search model. Ontology knowledge base is to store domain specific service ontologies and service description entity (SDE metadata. The search model is to retrieve SDE metadata as efficient for Education lenders, which include mathematical model. The Natural language processing techniques for spell-check and synonym based search. The results are retrieved and stored in an ontology, which in terms prevents the data redundancy. The results are more accurate to search, sensitive to spell check and synonymous context. This paper reduces the user’s time and complexity in finding for the correct results of his/her search text and our model provides more accurate results. A series of experiments are conducted in order to respectively evaluate the mechanism and the employed mathematical model.

  17. Pesquisas na web: estratégias de busca Searching on the web: search strategies p. 53-66

    Directory of Open Access Journals (Sweden)

    Elias Estevão Goulart

    2007-01-01

    Full Text Available A World Wide Web tem sido utilizada amplamente para a busca e seleção de informações, resultando em um de seus principais empregos como suporte para atividades acadêmicas e profissionais. Este trabalho apresenta um estudo sobre as estratégias de busca de informações na World Wide Web, visando analisar e comparar os resultados de uma pesquisa exploratória com estudo similar realizado na Universidade de Telaviv. Apresenta-se nove formas possíveis de buscas e como elas foram utilizadas nos estudos comparados. Como resultado, são apresentadas as mais efetivas e sugere-se melhor treinamento dos usuários para o conhecimento das técnicas apresentadas. Palavras-chave Estratégias de busca; Internet; World wide web Abstract The World Wide Web has been largely used for searching and selecting information, and is one of the most important tools to support academic and professional activities. This work presents a study about information search strategies on the world wide web, seeking to analyze and compare the results of a similar exploratory research implemented at Telaviv University. It presents nine possible ways of information search and how they were compared in both studies. As a result, the most effective of the strategies are presented and users training are suggested as the best way to make them aware of the discussed techniques. Key words Search strategies; Internet; World wide web

  18. A CLIR Interface to a Web search engine.

    Science.gov (United States)

    Daumke, Philipp; Schulz, Stefan; Markó, Kornél

    2005-01-01

    Medical document retrieval presents a unique combination of challenges for the design and implementation of retrieval engines. We introduce a method to meet these challenges by implementing a multilingual retrieval interface for biomedical content in the World Wide Web. To this end we developed an automated method for interlingual query construction by which a standard Web search engine is enabled to process non-English queries from the biomedical domain in order to retrieve English documents.

  19. Generating Personalized Web Search Using Semantic Context

    Directory of Open Access Journals (Sweden)

    Zheng Xu

    2015-01-01

    Full Text Available The “one size fits the all” criticism of search engines is that when queries are submitted, the same results are returned to different users. In order to solve this problem, personalized search is proposed, since it can provide different search results based upon the preferences of users. However, existing methods concentrate more on the long-term and independent user profile, and thus reduce the effectiveness of personalized search. In this paper, the method captures the user context to provide accurate preferences of users for effectively personalized search. First, the short-term query context is generated to identify related concepts of the query. Second, the user context is generated based on the click through data of users. Finally, a forgetting factor is introduced to merge the independent user context in a user session, which maintains the evolution of user preferences. Experimental results fully confirm that our approach can successfully represent user context according to individual user information needs.

  20. Collaborative Web Search Who, What, Where, When, and Why

    CERN Document Server

    Morris, Meredith Ringel

    2009-01-01

    Today, Web search is treated as a solitary experience. Web browsers and search engines are typically designed to support a single user, working alone. However, collaboration on information-seeking tasks is actually commonplace. Students work together to complete homework assignments, friends seek information about joint entertainment opportunities, family members jointly plan vacation travel, and colleagues jointly conduct research for their projects. As improved networking technologies and the rise of social media simplify the process of remote collaboration, and large, novel display form-fac

  1. Museum Web Search Behaviour of Special Interest Visitors

    DEFF Research Database (Denmark)

    Skov, Mette; Ingwersen, Peter

    2014-01-01

    There is a current trend to make museum collections widely accessible by digitising cultural heritage collections for the Internet. The present study takes a user perspective and explores the characteristics of online museum visitors' web search behaviour. A combination of quantitative and qualit......There is a current trend to make museum collections widely accessible by digitising cultural heritage collections for the Internet. The present study takes a user perspective and explores the characteristics of online museum visitors' web search behaviour. A combination of quantitative...

  2. Snippet-based relevance predictions for federated web search

    OpenAIRE

    Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd

    2013-01-01

    How well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments reported in this paper make use of result snippets and pages from a diverse set of actual Web search engines. A linear classifier is trained to predict the snippet-based user estimate of page relevance, but also, to predict the actual page relevance, ...

  3. ArraySearch: A Web-Based Genomic Search Engine.

    Science.gov (United States)

    Wilson, Tyler J; Ge, Steven X

    2012-01-01

    Recent advances in microarray technologies have resulted in a flood of genomics data. This large body of accumulated data could be used as a knowledge base to help researchers interpret new experimental data. ArraySearch finds statistical correlations between newly observed gene expression profiles and the huge source of well-characterized expression signatures deposited in the public domain. A search query of a list of genes will return experiments on which the genes are significantly up- or downregulated collectively. Searches can also be conducted using gene expression signatures from new experiments. This resource will empower biological researchers with a statistical method to explore expression data from their own research by comparing it with expression signatures from a large public archive.

  4. The ethics of physicians' web searches for patients' information.

    Science.gov (United States)

    Genes, Nicholas; Appel, Jacob

    2015-01-01

    When physicians search the web for personal information about their patients, others have argued that this undermines patients' trust, and the physician-patient relationship in general. We add that this practice also places other relationships at risk, and could jeopardize a physician's career. Yet there are also reports of web searches that have unambiguously helped in the care of patients, suggesting circumstances in which a routine search of the web could be beneficial. We advance the notion that, just as nonverbal cues and unsolicited information can be useful in clinical decision making, so too can online information from patients. As electronic records grow more voluminous and span more types of data, searching these resources will become a clinical skill, to be used judiciously and with care--just as evaluating the literature is, today. But to proscribe web searches of patients' information altogether is as nonsensical as disregarding findings from physical exams-instead, what's needed are guidelines for when to look and how to evaluate what's uncovered, online.

  5. Intelligent Information Systems for Web Product Search

    NARCIS (Netherlands)

    D. Vandic (Damir)

    2017-01-01

    markdownabstractOver the last few years, we have experienced an increase in online shopping. Consequently, there is a need for efficient and effective product search engines. The rapid growth of e-commerce, however, has also introduced some challenges. Studies show that users can get overwhelmed by

  6. Intelligent Information Systems for Web Product Search

    NARCIS (Netherlands)

    D. Vandic (Damir)

    2017-01-01

    markdownabstractOver the last few years, we have experienced an increase in online shopping. Consequently, there is a need for efficient and effective product search engines. The rapid growth of e-commerce, however, has also introduced some challenges. Studies show that users can get overwhelmed by

  7. Effectively Searching Maps in Web Documents

    CERN Document Server

    Tan, Qingzhao; Giles, C Lee

    2009-01-01

    Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps i...

  8. Raising Reliability of Web Search Tool Research through Replication and Chaos Theory

    OpenAIRE

    Nicholson, Scott

    1999-01-01

    Because the World Wide Web is a dynamic collection of information, the Web search tools (or "search engines") that index the Web are dynamic. Traditional information retrieval evaluation techniques may not provide reliable results when applied to the Web search tools. This study is the result of ten replications of the classic 1996 Ding and Marchionini Web search tool research. It explores the effects that replication can have on transforming unreliable results from one iteration into replica...

  9. Database selection and result merging in P2P web search

    NARCIS (Netherlands)

    Chernov, S.; Serdyukov, P.; Bender, M.; Michel, S.; Weikum, G.; Zimmer, C.

    2005-01-01

    Intelligent Web search engines are extremely popular now. Currently, only the commercial centralized search engines like Google can process terabytes of Web data. Alternative search engines fulfilling collaborative Web search on a voluntary basis are usually based on a blooming Peer-to-Peer (P2P) te

  10. A World Wide Web Region-Based Image Search Engine

    DEFF Research Database (Denmark)

    Kompatsiaris, Ioannis; Triantafyllou, Evangelia; Strintzis, Michael G.

    2001-01-01

    information. These features along with additional information such as the URL location and the date of index procedure are stored in a database. The user can access and search this indexed content through the Web with an advanced and user friendly interface. The output of the system is a set of links...

  11. Analysis of Scifinder Scholar and Web of Science Citation Searches.

    Science.gov (United States)

    Whitley, Katherine M.

    2002-01-01

    With "Chemical Abstracts" and "Science Citation Index" both now available for citation searching, this study compares the duplication and uniqueness of citing references for works of chemistry researchers for the years 1999-2001. The two indexes cover very similar source material. This analysis of SciFinder Scholar and Web of…

  12. Snippet-based relevance predictions for federated web search

    NARCIS (Netherlands)

    Demeester, Thomas; Nguyen, Dong; Trieschnigg, Dolf; Develder, Chris; Hiemstra, Djoerd

    2013-01-01

    How well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments reported in this paper make use of result snippets and

  13. Analysis of Scifinder Scholar and Web of Science Citation Searches.

    Science.gov (United States)

    Whitley, Katherine M.

    2002-01-01

    With "Chemical Abstracts" and "Science Citation Index" both now available for citation searching, this study compares the duplication and uniqueness of citing references for works of chemistry researchers for the years 1999-2001. The two indexes cover very similar source material. This analysis of SciFinder Scholar and Web of…

  14. Web-Based Undergraduate Chemistry Problem-Solving: The Interplay of Task Performance, Domain Knowledge and Web-Searching Strategies

    Science.gov (United States)

    She, Hsiao-Ching; Cheng, Meng-Tzu; Li, Ta-Wei; Wang, Chia-Yu; Chiu, Hsin-Tien; Lee, Pei-Zon; Chou, Wen-Chi; Chuang, Ming-Hua

    2012-01-01

    This study investigates the effect of Web-based Chemistry Problem-Solving, with the attributes of Web-searching and problem-solving scaffolds, on undergraduate students' problem-solving task performance. In addition, the nature and extent of Web-searching strategies students used and its correlation with task performance and domain knowledge also…

  15. Computing Semantic Similarity Measure Between Words Using Web Search Engine

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-05-01

    Full Text Available Semantic Similarity measures between words plays an important role in information retrieval, natural language processing and in various tasks on the web. In this paper, we have proposed a Modified Pattern Extraction Algorithm to compute th e supervised semantic similarity measure between the words by combining both page count meth od and web snippets method. Four association measures are used to find semantic simi larity between words in page count method using web search engines. We use a Sequential Minim al Optimization (SMO support vector machines (SVM to find the optimal combination of p age counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and non-synonymous word-pairs. The propo sed Modified Pattern Extraction Algorithm outperforms by 89.8 percent of correlatio n value.

  16. Nowcasting Mobile Games Ranking Using Web Search Query Data

    Directory of Open Access Journals (Sweden)

    Yoones A. Sekhavat

    2016-01-01

    Full Text Available In recent years, the Internet has become embedded into the purchasing decision of consumers. The purpose of this paper is to study whether the Internet behavior of users correlates with their actual behavior in computer games market. Rather than proposing the most accurate model for computer game sales, we aim to investigate to what extent web search query data can be exploited to nowcast (contraction of “now” and “forecasting” referring to techniques used to make short-term forecasts (predict the present status of the ranking of mobile games in the world. Google search query data is used for this purpose, since this data can provide a real-time view on the topics of interest. Various statistical techniques are used to show the effectiveness of using web search query data to nowcast mobile games ranking.

  17. Using social annotation and web log to enhance search engine

    CERN Document Server

    Nguyen, Vu Thanh

    2009-01-01

    Search services have been developed rapidly in social Internet. It can help web users easily to find their documents. So that, finding a best method search is always an imagine. This paper would like introduce hybrid method of LPageRank algorithm and Social Sim Rank algorithm. LPageRank is the method using link structure to rank priority of page. It doesn't care content of page and content of query. Therefore, we want to use benefit of social annotations to create the latent semantic association between queries and annotations. This model, we use algorithm SocialPageRank and LPageRank to enhance accuracy of search system. To experiment and evaluate the proposed of the new model, we have used this model for Music Machine Website with their web logs.

  18. Analysis of the Temporal Behaviour of Search Engine Crawlers at Web Sites

    Directory of Open Access Journals (Sweden)

    Jeeva Jose

    2013-06-01

    Full Text Available Web log mining is the extraction of web logs to analyze user behaviour at web sites. In addition to user information, web logs provide immense information about search engine traffic and behaviour. Search engine crawlers are highly automated programs that periodically visit the web site to collect information. The behaviour of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. The time spent by various crawlers is significant in identifying the server load as major proportion of the server load is constituted by search engine crawlers. A temporal analysis of the search engine crawlers were done to identify their behaviour. It was found that there is a significant difference in the total time spent by various crawlers. The presence of search engine crawlers at web sites on hourly basis was also done to identify the dynamics of search engine crawlers at web sites.

  19. Folksonomies, the Web and Search Engines

    Directory of Open Access Journals (Sweden)

    Louise Spiteri

    2008-09-01

    Full Text Available The aim of this special issue of Webology is to explore developments in the design of folksonomies, knowledge organization systems, and search engines to reflect end user preferences for describing items of interest. Particular emphasis is placed on folksonomies, an area of study that has grown exponentially since the term was first coined by Thomas Vander Wal in 2004: "Folksonomy is the result of personal free tagging of information and objects (anything with a URL for one's own retrieval. The tagging is done in a social environment (usually shared and open to others. Folksonomy is created from the act of tagging by the person consuming the information" (Vander Wal, 2007. Since 2004, social software applications and their use of tagging have continued to increase in popularity; in its site dedicated to such applications, Wikipedia (2008 lists no less that 11 extant media sharing sites and 26 social bookmarking sites. This list does not take into account the approximate 20 media cataloguing sites, not to mention the innumerable blogging sites that employ tagging.

  20. Relevant Pages in semantic Web Search Engines using Ontology

    Directory of Open Access Journals (Sweden)

    Jemimah Simon

    2012-03-01

    Full Text Available In general, search engines are the most popular means of searching any kind of information from the Internet. Generally, keywords are given to the search engine and the Web database returns the documents containing specified keywords. In many situations, irrelevant results are given as results to the user query since different keywords are used in different forms in various documents. The development of the next generation Web, Semantic Web, will change this situation. This paper proposes a prototype of relation-based search engine which ranks the page according to the user query and on annotated results. Page sub graph is computed for each annotated page in the result set by generating all possible combinations for the relation in the sub graph. A relevance score is computed for each annotated page using a probability measure. A relation based ranking model is used which displays the pages in the final result set according to their relevance score. This ranking is provided by considering keyword-concept associations. Thus, the final result set contains pages in the order of their constrained relevant scores.

  1. Start Your Search Engines. Part One: Taming Google--and Other Tips to Master Web Searches

    Science.gov (United States)

    Adam, Anna; Mowers, Helen

    2008-01-01

    There are a lot of useful tools on the Web, all those social applications, and the like. Still most people go online for one thing--to perform a basic search. For most fact-finding missions, the Web is there. But--as media specialists well know--the sheer wealth of online information can hamper efforts to focus on a few reliable references.…

  2. CWI and TU Delft at TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks

    NARCIS (Netherlands)

    Bellogín Kouki, A.; Gebremeskel, G.G.; He, J.; Lin, J.J.P.; Said, A.; Samar, T.; Vries, A.P. de; Vuurens, J.B.P.

    2014-01-01

    This paper provides an overview of the work done at the Centrum Wiskunde & Informatica (CWI) and Delft University of Technology (TU Delft) for different tracks of TREC 2013. We participated in the Contextual Suggestion Track, the Federated Web Search Track, the Knowledge Base Acceleration (KBA) Trac

  3. Web of science: a unique method of cited reference searching.

    Science.gov (United States)

    Sevinc, Alper

    2004-07-01

    The number of times an article is acknowledged as a reference in another article reflects its scientific impact. Citation analysis is one of the parameters for assessing the quality of research published in scientific, technology and social science journals. Web of Science enables users to search current and retrospective multidisciplinary information. Parameters and practical applications evaluating journal and article citation characteristics available through the Science Citation Index are summarized.

  4. Private Information Disclosure from Web Searches. (The case of Google Web History)

    CERN Document Server

    Castelluccia, Claude; Perito, Daniele

    2010-01-01

    As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world's largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google's Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiographer uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conducted over real network tr...

  5. First 20 Precision among World Wide Web Search Services (Search Engines).

    Science.gov (United States)

    Leighton, H. Vernon; Srivastava, Jaideep

    1999-01-01

    Compares five World Wide Web search engines for precision on the first 20 results returned for 15 queries, adding weight for ranking effectiveness. Discusses methods to lessen evaluator bias, evaluation criteria, definition of relevance, experimental design, the structure of queries, and future work. (Author/LRW)

  6. Web Feet Guide to Search Engines: Finding It on the Net.

    Science.gov (United States)

    Web Feet, 2001

    2001-01-01

    This guide to search engines for the World Wide Web discusses selecting the right search engine; interpreting search results; major search engines; online tutorials and guides; search engines for kids; specialized search tools for various subjects; and other specialized engines and gateways. (LRW)

  7. Mining social media and web searches for disease detection

    Directory of Open Access Journals (Sweden)

    Y. Tony Yang

    2013-05-01

    Full Text Available Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.

  8. Mining social media and web searches for disease detection.

    Science.gov (United States)

    Yang, Y Tony; Horneffer, Michael; DiLisio, Nicole

    2013-04-28

    Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.

  9. Journey of Web Search Engines: Milestones, Challenges & Innovations

    Directory of Open Access Journals (Sweden)

    Mamta Kathuria

    2016-12-01

    Full Text Available Past few decades have witnessed an information big bang in the form of World Wide Web leading to gigantic repository of heterogeneous data. A humble journey that started with the network connection between few computers at ARPANET project has reached to a level wherein almost all the computers and other communication devices of the world have joined together to form a huge global information network that makes available most of the information related to every possible heterogeneous domain. Not only the managing and indexing of this repository is a big concern but to provide a quick answer to the user‘s query is also of critical importance. Amazingly, rather miraculously, the task is being done quite efficiently by the current web search engines. This miracle has been possible due to a series of mathematical and technological innovations continuously being carried out in the area of search techniques. This paper takes an overview of search engine evolution from primitive to the present.

  10. REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    S.K. Jayanthi

    2013-01-01

    Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.

  11. A Sorting Method of Meta-search Based on User Web Page Interactive Model

    Institute of Scientific and Technical Information of China (English)

    Zongli Jiang; Tengyu Zhang

    2012-01-01

    Nowadays, there is a problem in most meta-search engines that many web pages searched have nothing to do with users' expectations. We introduce a new user web page interactive model under the framework ofmeta search, which analyzes users' action to get users' interests and storages them, and update these information with users' feedback. Meanwhile this model analyzes user records stored in web, attaches labels to the web page with statistics of user interest. We calculate the similarity about user and web page with the information from model and add similarity to scores of web pages. The experimental results reveal that this method can improve the relevance of the information retrieval.

  12. The invisible Web uncovering information sources search engines can't see

    CERN Document Server

    Sherman, Chris

    2001-01-01

    Enormous expanses of the Internet are unreachable with standard web search engines. This book provides the key to finding these hidden resources by identifying how to uncover and use invisible web resources. Mapping the invisible Web, when and how to use it, assessing the validity of the information, and the future of Web searching are topics covered in detail. Only 16 percent of Net-based information can be located using a general search engine. The other 84 percent is what is referred to as the invisible Web-made up of information stored in databases. Unlike pages on the visible Web, informa

  13. Novel web service selection model based on discrete group search.

    Science.gov (United States)

    Zhai, Jie; Shao, Zhiqing; Guo, Yi; Zhang, Haiteng

    2014-01-01

    In our earlier work, we present a novel formal method for the semiautomatic verification of specifications and for describing web service composition components by using abstract concepts. After verification, the instantiations of components were selected to satisfy the complex service performance constraints. However, selecting an optimal instantiation, which comprises different candidate services for each generic service, from a large number of instantiations is difficult. Therefore, we present a new evolutionary approach on the basis of the discrete group search service (D-GSS) model. With regard to obtaining the optimal multiconstraint instantiation of the complex component, the D-GSS model has competitive performance compared with other service selection models in terms of accuracy, efficiency, and ability to solve high-dimensional service composition component problems. We propose the cost function and the discrete group search optimizer (D-GSO) algorithm and study the convergence of the D-GSS model through verification and test cases.

  14. VisiNav: Visual Web Data Search and Navigation

    Science.gov (United States)

    Harth, Andreas

    Semantic Web technologies facilitate data integration over a large number of sources with decentralised and loose coordination, ideally leading to interlinked datasets which describe objects, their attributes and links to other objects. Such information spaces are amenable to queries that go beyond traditional keyword search over documents. To this end, we present a formal query model comprising six atomic operations over object-structured datasets: keyword search, object navigation, facet selection, path traversal, projection, and sorting. Using these atomic operations, users can incrementally assemble complex queries that yield a set of objects or trees of objects as result. Results can then be either directly displayed or exported to application programs or online services. We report on user experiments carried out during the design phase of the system, and present performance results for a range of queries over 18.5m statements aggregated from 70k sources.

  15. Search of the Deep and Dark Web via DARPA Memex

    Science.gov (United States)

    Mattmann, C. A.

    2015-12-01

    Search has progressed through several stages due to the increasing size of the Web. Search engines first focused on text and its rate of occurrence; then focused on the notion of link analysis and citation then on interactivity and guided search; and now on the use of social media - who we interact with, what we comment on, and who we follow (and who follows us). The next stage, referred to as "deep search," requires solutions that can bring together text, images, video, importance, interactivity, and social media to solve this challenging problem. The Apache Nutch project provides an open framework for large-scale, targeted, vertical search with capabilities to support all past and potential future search engine foci. Nutch is a flexible infrastructure allowing open access to ranking; URL selection and filtering approaches, to the link graph generated from search, and Nutch has spawned entire sub communities including Apache Hadoop and Apache Tika. It addresses many current needs with the capability to support new technologies such as image and video. On the DARPA Memex project, we are creating create specific extensions to Nutch that will directly improve its overall technological superiority for search and that will directly allow us to address complex search problems including human trafficking. We are integrating state-of-the-art algorithms developed by Kitware for IARPA Aladdin combined with work by Harvard to provide image and video understanding support allowing automatic detection of people and things and massive deployment via Nutch. We are expanding Apache Tika for scene understanding, object/person detection and classification in images/video. We are delivering an interactive and visual interface for initiating Nutch crawls. The interface uses Python technologies to expose Nutch data and to provide a domain specific language for crawls. With the Bokeh visualization library the interface we are delivering simple interactive crawl visualization and

  16. Curating the Web: Building a Google Custom Search Engine for the Arts

    Science.gov (United States)

    Hennesy, Cody; Bowman, John

    2008-01-01

    Google's first foray onto the web made search simple and results relevant. With its Co-op platform, Google has taken another step toward dramatically increasing the relevancy of search results, further adapting the World Wide Web to local needs. Google Custom Search Engine, a tool on the Co-op platform, puts one in control of his or her own search…

  17. Discovering How Students Search a Library Web Site: A Usability Case Study.

    Science.gov (United States)

    Augustine, Susan; Greene, Courtney

    2002-01-01

    Discusses results of a usability study at the University of Illinois Chicago that investigated whether Internet search engines have influenced the way students search library Web sites. Results show students use the Web site's internal search engine rather than navigating through the pages; have difficulty interpreting library terminology; and…

  18. Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web.

    Science.gov (United States)

    Eagan, Ann; Bender, Laura

    Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard search engines people are accustomed to using. Intended for librarians, this paper defines search engines, directories, spiders, and robots, and covers basics…

  19. Federated Search and the Library Web Site: A Study of Association of Research Libraries Member Web Sites

    Science.gov (United States)

    Williams, Sarah C.

    2010-01-01

    The purpose of this study was to investigate how federated search engines are incorporated into the Web sites of libraries in the Association of Research Libraries. In 2009, information was gathered for each library in the Association of Research Libraries with a federated search engine. This included the name of the federated search service and…

  20. Federated Search and the Library Web Site: A Study of Association of Research Libraries Member Web Sites

    Science.gov (United States)

    Williams, Sarah C.

    2010-01-01

    The purpose of this study was to investigate how federated search engines are incorporated into the Web sites of libraries in the Association of Research Libraries. In 2009, information was gathered for each library in the Association of Research Libraries with a federated search engine. This included the name of the federated search service and…

  1. Assessment and Comparison of Search capabilities of Web-based Meta-Search Engines: A Checklist Approach

    Directory of Open Access Journals (Sweden)

    Alireza Isfandiyari Moghadam

    2010-03-01

    Full Text Available   The present investigation concerns evaluation, comparison and analysis of search options existing within web-based meta-search engines. 64 meta-search engines were identified. 19 meta-search engines that were free, accessible and compatible with the objectives of the present study were selected. An author’s constructed check list was used for data collection. Findings indicated that all meta-search engines studied used the AND operator, phrase search, number of results displayed setting, previous search query storage and help tutorials. Nevertheless, none of them demonstrated any search options for hypertext searching and displaying the size of the pages searched. 94.7% support features such as truncation, keywords in title and URL search and text summary display. The checklist used in the study could serve as a model for investigating search options in search engines, digital libraries and other internet search tools.

  2. Web search queries can predict stock market volumes.

    Science.gov (United States)

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  3. Web search queries can predict stock market volumes.

    Directory of Open Access Journals (Sweden)

    Ilaria Bordino

    Full Text Available We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.

  4. Web Search Queries Can Predict Stock Market Volumes

    Science.gov (United States)

    Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar

    2012-01-01

    We live in a computerized and networked society where many of our actions leave a digital trace and affect other people’s actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www. PMID:22829871

  5. Web Spam, Social Propaganda and the Evolution of Search Engine Rankings

    Science.gov (United States)

    Metaxas, Panagiotis Takis

    Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990's, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play.

  6. Analysis of Haptics Evolution from Web Search Engines’ Data

    Directory of Open Access Journals (Sweden)

    Agnès Guerraz

    2009-08-01

    Full Text Available This article proposes using search engine results data such as the number of results containing relevant terms, to measure the evolution of Haptics, the field devoted to the science and technology of the sense of touch. Haptics is a complex discipline which is at the intersection of the knowledge of several specialized fields like robotics, computer science, psychology, and mathematics. It can also appear as a new and emergent discipline due to the fact that many promising haptic interfaces, which allow innovative multimodal applications in many fields, have become mature only recently. The study presented in this article uses data collected at different periods of time (in December 1999, January 2004, January 2005, November 2006 and April 2007 onWeb search engines from requests on three different terminologies: haptique, haptik and haptics, taken respectively from French, German, and English languages. The evolution of Haptics is seemingly reflected by to the online frequency of these specific terms over time. This evolution has been measured by considering the Internet community through search engines such as Google or Yahoo!

  7. Developing as new search engine and browser for libraries to search and organize the World Wide Web library resources

    OpenAIRE

    Sreenivasulu, V.

    2000-01-01

    Internet Granthalaya urges world wide advocates and targets at the task of creating a new search engine and dedicated browseer. Internet Granthalaya may be the ultimate search engine exclusively dedicated for every library use to search and organize the world wide web libary resources

  8. Developing as new search engine and browser for libraries to search and organize the World Wide Web library resources

    OpenAIRE

    SREENIVASULU, V.

    2000-01-01

    Internet Granthalaya urges world wide advocates and targets at the task of creating a new search engine and dedicated browseer. Internet Granthalaya may be the ultimate search engine exclusively dedicated for every library use to search and organize the world wide web libary resources

  9. Web search behavior of university students: a case study at University of the Punjab

    Directory of Open Access Journals (Sweden)

    Khalid Mahmood

    2009-06-01

    Full Text Available The World Wide Web is now known to be the richest source of information. The growth rate of the web is exponential. This paper explores different aspects of web search behavior of university students, in terms of user's background and experience with web, purpose of use, searching skills, query formulation, frequency of use, favorite search engine, etc. All these factors contribute to the way in which the students search the web. Data have been collected from students of the Faculty of Economics and Management Sciences, University of the Punjab, Lahore through questionnaire. Key findings include the use of web for academic tasks, preference of Google, reformulation of query, use of basic and advance search features, browsing of first ten hits and problem of slow speed.

  10. A Case Study of Search Engine on World Wide Web for Chemical Fiber Engineering

    Institute of Scientific and Technical Information of China (English)

    张利; 邵世煌; 曾献辉; 尹美华

    2001-01-01

    Search engine is an effective approach to promote the service quality of the World Wide Web. On terms of the analysis of search engines at home and abroad, the developing principle of search engines is given according to the requirement of Web information for chemical fiber engineering. The implementation method for the communication and dynamic refreshment of information on home page of the search engines are elaborated by using programming technology of Active Server Page 3.0 (ASP3.0). The query of chemical fiber information and automatic linking of chemical fiber Web sites can be easily realized by the developed search engine under Internet environment according to users' requirement.

  11. The Ontological Perspectives of the Semantic Web and the Metadata Harvesting Protocol: Applications of Metadata for Improving Web Search.

    Science.gov (United States)

    Fast, Karl V.; Campbell, D. Grant

    2001-01-01

    Compares the implied ontological frameworks of the Open Archives Initiative Protocol for Metadata Harvesting and the World Wide Web Consortium's Semantic Web. Discusses current search engine technology, semantic markup, indexing principles of special libraries and online databases, and componentization and the distinction between data and…

  12. Semantic similarity measure in biomedical domain leverage web search engine.

    Science.gov (United States)

    Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei

    2010-01-01

    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.

  13. INTELLIGENT SEARCH ENGINE-BASED UNIVERSAL DESCRIPTION, DISCOVERY AND INTEGRATION FOR WEB SERVICE DISCOVERY

    Directory of Open Access Journals (Sweden)

    Tamilarasi Karuppiah

    2014-01-01

    Full Text Available Web Services standard has been broadly acknowledged by industries and academic researches along with the progress of web technology and e-business. Increasing number of web applications have been bundled as web services that can be published, positioned and invoked across the web. The importance of the issues regarding their publication and innovation attains a maximum as web services multiply and become more advanced and mutually dependent. With the intension of determining the web services through effiective manner with in the minimum time period in this study proposes an UDDI with intelligent serach engine. In order to publishing and discovering web services initially, the web services are published in the UDDI registry subsequently the published web services are indexed. To improve the efficiency of discovery of web services, the indexed web services are saved as index database. The search query is compared with the index database for discovering of web services and the discovered web services are given to the service customer. The way of accessing the web services is stored in a log file, which is then utilized to provide personalized web services to the user. The finding of web service is enhanced significantly by means of an efficient exploring capability provided by the proposed system and it is accomplished of providing the maximum appropriate web service. Universal Description, Discovery and Integration (UDDI.

  14. A Taxonomic Search Engine: Federating taxonomic databases using web services

    Directory of Open Access Journals (Sweden)

    Page Roderic DM

    2005-03-01

    Full Text Available Abstract Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata for each name. Conclusion The Taxonomic Search Engine is available at http://darwin.zoology.gla.ac.uk/~rpage/portal/ and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names.

  15. Improving Web Page Retrieval using Search Context from Clicked Domain Names

    NARCIS (Netherlands)

    Li, Rongmei

    2009-01-01

    Search context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data implies its relevance to the query and can be use

  16. Improving Web Page Retrieval using Search Context from Clicked Domain Names

    NARCIS (Netherlands)

    Li, R.

    Search context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data implies its relevance to the query and can be

  17. Darwin on the Web: The Evolution of Search Tools.

    Science.gov (United States)

    Vidmar, Dale J.

    1999-01-01

    Discusses various search strategies and tools that can be used for searching on the Internet, including search engines and search directories; Boolean searching; metasearching; relevancy ranking; automatic phrase detection; backlinks; natural-language searching; clustering and cataloging information; image searching; customization and portals;…

  18. How Users Search the Mobile Web: A Model for Understanding the Impact of Motivation and Context on Search Behaviors

    Directory of Open Access Journals (Sweden)

    Dan Wu

    2016-03-01

    Full Text Available Purpose: This study explores how search motivation and context influence mobile Web search behaviors. Design/methodology/approach: We studied 30 experienced mobile Web users via questionnaires, semi-structured interviews, and an online diary tool that participants used to record their daily search activities. SQLite Developer was used to extract data from the users' phone logs for correlation analysis in Statistical Product and Service Solutions (SPSS. Findings: One quarter of mobile search sessions were driven by two or more search motivations. It was especially difficult to distinguish curiosity from time killing in particular user reporting. Multi-dimensional contexts and motivations influenced mobile search behaviors, and among the context dimensions, gender, place, activities they engaged in while searching, task importance, portal, and interpersonal relations (whether accompanied or alone when searching correlated with each other. Research limitations: The sample was comprised entirely of college students, so our findings may not generalize to other populations. More participants and longer experimental duration will improve the accuracy and objectivity of the research. Practical implications: Motivation analysis and search context recognition can help mobile service providers design applications and services for particular mobile contexts and usages. Originality/value: Most current research focuses on specific contexts, such as studies on place, or other contextual influences on mobile search, and lacks a systematic analysis of mobile search context. Based on analysis of the impact of mobile search motivations and search context on search behaviors, we built a multi-dimensional model of mobile search behaviors.

  19. Collaborative Framework with User Personalization for Efficient web Search : A D3 Mining approach

    Directory of Open Access Journals (Sweden)

    V.Vijayadeepa

    2011-04-01

    Full Text Available User personalization becomes more important task for web search engines. We develop a unified model to provide user personalization for efficient web search. We collect implicit feedback from the users by tracking their behavior on the web page based on their actions on the web page. We track actions like save, copy, bookmark, time spent and logging into data base, which will be used to build unified model. Our model is used as a collaborative framework using which related users can mine the information collaboratively with littleamount of time. Based on the feed back from the users we categorize the users and search query. We build the unified model based on the categorized information, using which we provide personalized results to the user during web search. Our methodology minimizes the search time and provides more amount of relevant information.

  20. DBLC_SPAMCLUST: SPAMDEXING DETECTION BY CLUSTERING CLIQUE-ATTACKS IN WEB SEARCH ENGIN

    OpenAIRE

    Dr.S.K.JAYANTHI,; Ms.S.Sasikala

    2011-01-01

    Search engines are playing a more and more important role in discovering information on the web now a day. Spam web pages, however, are employing various tricks to bamboozle search engines, therefore achieving undeserved ranks. In this paper an algorithm DBLCSPAMCLUST is proposed for spam detection based on content and link attributes details, which is an extension of DBSpamClust [1]. As showing through experiments such a method can filter out web spam effectively.

  1. DBLC_SPAMCLUST: SPAMDEXING DETECTION BY CLUSTERING CLIQUE-ATTACKS IN WEB SEARCH ENGIN

    Directory of Open Access Journals (Sweden)

    Dr.S.K.JAYANTHI,

    2011-06-01

    Full Text Available Search engines are playing a more and more important role in discovering information on the web now a day. Spam web pages, however, are employing various tricks to bamboozle search engines, therefore achieving undeserved ranks. In this paper an algorithm DBLCSPAMCLUST is proposed for spam detection based on content and link attributes details, which is an extension of DBSpamClust [1]. As showing through experiments such a method can filter out web spam effectively.

  2. A UDDI Search Engine for SVG Federated Medical Imaging Web Services

    Directory of Open Access Journals (Sweden)

    Sabah Mohammed

    2006-01-01

    Full Text Available With more and more medical web services appearing on the web, web service’s discovery mechanism becomes essential. UDDI is an online registry standard to facilitate the discovery of business partners and services. However, most medical imaging applications exist within their own protected domain and were never designed to participate and operate with other applications across the web. However, private UDDI registries in federated organizations should be able to share the service descriptions as well as to access them if they are authorized. The new initiatives on Federated Web Services Identity Management can resolve a range of both technical and political barriers to enable wide-scale participation and interoperation of separate domains into a singular, robust user experience. However, there is no widely acceptable standard for federated web services and most of the available venders frameworks concentrate only on the security issue of the federation leaving the issue of searching and discovering web services largely primitive. Federated web services security and web services searching are uniquely intertwined, mutually reliant on each other and are poised to finally solve a long-running problem in both IT and systems security. Traditional keyword search is insufficient for web services search as the very small text fragments in web services are unsuitable for keyword search and the underlying structure and semantics of the web service are not exploited. Engineering solutions that address the security and accessibility concerns of web services, however, is a challenging task. This article introduces an extension to the traditional UDDI that enables sophisticated types of searching based on a lightweight web services federated security infrastructure.

  3. White Hat Search Engine Optimization (SEO: Structured Web Data for Libraries

    Directory of Open Access Journals (Sweden)

    Dan Scott

    2015-06-01

    Full Text Available “White hat” search engine optimization refers to the practice of publishing web pages that are useful to humans, while enabling search engines and web applications to better understand the structure and content of your website. This article teaches you to add structured data to your website so that search engines can more easily connect patrons to your library locations, hours, and contact information. A web page for a branch of the Greater Sudbury Public Library retrieved in January 2015 is used as the basis for examples that progressively enhance the page with structured data. Finally, some of the advantages structured data enables beyond search engine optimization are explored

  4. Overview of the Web Search Engine%Web搜索引擎综述

    Institute of Scientific and Technical Information of China (English)

    张卫丰; 徐宝文; 周晓宇; 许蕾; 李东

    2001-01-01

    With the explosive increase of the network information,people can find information more and more difficultly. The occurrence of the Web search engine overcomes this problem in some degree. This paper tells about the history of the search engine ,the current state of the search engine. Some guidelines about the search engine are analysed and the related checking methods are also given. In this basis, we introduce the trend of the search engine.

  5. Search Engines and Resource Discovery on the Web: Is Dublin Core an Impact Factor?

    Directory of Open Access Journals (Sweden)

    Mehdi Safari

    2005-08-01

    Full Text Available This study evaluates the effectiveness of the Dublin Core metadata elements on the retrieval of web pages in a suite of six search engines, AlltheWeb, AltaVista, Google, Excite, Lycos, and WebCrawler. The effectiveness of four elements, including title, creator, subject and contributor, that concentrate on resource discovery was experimentally evaluated. Searches were made of the keywords extracted from web pages of the Iranian International Journal of Science, before and after metadata implementation. In each search, the ranking of the first specific reference to the exact web page was recorded. The comparison of results and statistical analysis did not reveal a significant difference between control and experimental groups in the retrieval ranks of the web pages.

  6. A study of medical and health queries to web search engines.

    Science.gov (United States)

    Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk

    2004-03-01

    This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.

  7. Self-Education through Web-Searching - An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Răzvan-Alexandru Călin

    2015-10-01

    Full Text Available The 21st century is marked by the extensive and easy access to information through the virtual environment. Do we find in today's Romanian school the presence of a formative space - on the one hand, facilitator for a maximal exploitation of opportunities, and on the other hand, a "sensor" for new risks, characteristic to the information era? Is the "digital generation" (Mark Prensky of the beginning of century in Romania ready from these perspectives? The present paper outlines the results of a comparative exploratory study regarding the ordinary methods used by youngsters - from 5th and 6th grades, as well as 11th and 12th grades, from six different schools, high-schools and colleges from Dolj county – to find information about different topics/homework. The results offer the premises for hypothesis regarding this phenomenon at national level. The conclusions indicate as the main method of obtaining information the web-searching. They emphasize the absence of an initial specific educational training in this domain and allow the delineation of a suggestive image regarding possible future methods of action.

  8. Ensemble Learned Vaccination Uptake Prediction using Web Search Queries

    OpenAIRE

    Hansen, Niels Dalum; Lioma, Christina; Mølbak, Kåre

    2016-01-01

    We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. The clinical data is official vaccination registries, and the web data is query frequencies collected from Google Trends. Experiments with official vaccine records show that our method predicts vaccination uptake eff?ectively (4.7 Root Mean Squared Error). Whereas performance is best when combining clinical and web data, using solely web data yields...

  9. An analysis of search-based user interaction on the Semantic Web

    NARCIS (Netherlands)

    Hildebrand, M.; Ossenbruggen, J.R. van; Hardman, L.

    2007-01-01

    Many Semantic Web applications provide access to their resources through text-based search queries, using explicit semantics to improve the search results. This paper provides an analysis of the current state of the art in semantic search, based on 35 existing systems. We identify different types of

  10. The Effectiveness of Web Search Engines to Index New Sites from Different Countries

    Science.gov (United States)

    Pirkola, Ari

    2009-01-01

    Introduction: Investigates how effectively Web search engines index new sites from different countries. The primary interest is whether new sites are indexed equally or whether search engines are biased towards certain countries. If major search engines show biased coverage it can be considered a significant economic and political problem because…

  11. Constructing Virtual Documents for Keyword Based Concept Search in Web Ontology

    Directory of Open Access Journals (Sweden)

    Sapna Paliwal

    2013-04-01

    Full Text Available Web ontologies are structural frameworks for organizing information in semantics web and provide shared concepts. Ontology formally represents knowledge or information about particular entity as a set of concepts within a particular domain on semantic web. Web ontology helps to describe concepts within domain and also help us to enables semantic interoperability between two different applications byusing Falcons concept search. We can facilitate concept searching and ontologies reusing. Constructing virtual documents is a keyword based search in ontology. The proposed method helps us to find how search engine help user to find out ontologies in less time so we can satisfy their needs. It include some supportive technologies with new technique is to constructing virtual documents of concepts for keywordbased search and based on population scheme we rank the concept and ontologies, a way to generate structured snippets according to query. In this concept we can report the user feedback and usabilityevolution.

  12. A navigation flow map method of representing students' searching behaviors and strategies on the web, with relation to searching outcomes.

    Science.gov (United States)

    Lin, Chia-Ching; Tsai, Chin-Chung

    2007-10-01

    To acquire a better understanding of the online search strategies that students employ to use the Internet, this study investigated six university students' approaches to Web-based information searches. A new method, called navigation flow map (NFM), is presented that graphically displays the fluid and multilayered relationships between Web navigation and information retrieval that students use while navigating the Web. To document the application of NFM, the Web search strategies of six university students were analyzed as they used the Internet to perform two different tasks: scientific-based and social studies-based information searches. Through protocol analyses using the NFM method, the students' searching strategies were categorized into two types: Match or Exploration. The findings revealed that participants with an Exploration approach had more complicated and richer task-specific ways of searching information than those with a Match approach; and further, through between-task comparisons, we found that participants appeared to use different searching strategies to process natural science information compared to social studies information. Finally, the participants in the Exploration group also exhibited better task performance on the criterion measures than those in the Match group.

  13. Utility of Web search query data in testing theoretical assumptions about mephedrone.

    Science.gov (United States)

    Kapitány-Fövény, Máté; Demetrovics, Zsolt

    2017-05-01

    With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.

  14. Architecture of A Scalable Dynamic Parallel WebCrawler with High Speed Downloadable Capability for a Web Search Engine

    CERN Document Server

    Mukhopadhyay, Debajyoti; Ghosh, Soumya; Kar, Saheli; Kim, Young-Chon

    2011-01-01

    Today World Wide Web (WWW) has become a huge ocean of information and it is growing in size everyday. Downloading even a fraction of this mammoth data is like sailing through a huge ocean and it is a challenging task indeed. In order to download a large portion of data from WWW, it has become absolutely essential to make the crawling process parallel. In this paper we offer the architecture of a dynamic parallel Web crawler, christened as "WEB-SAILOR," which presents a scalable approach based on Client-Server model to speed up the download process on behalf of a Web Search Engine in a distributed Domain-set specific environment. WEB-SAILOR removes the possibility of overlapping of downloaded documents by multiple crawlers without even incurring the cost of communication overhead among several parallel "client" crawling processes.

  15. The “I’m Feeling Lucky Syndrome”: Teacher-Candidates’ Knowledge of Web Searching Strategies

    Directory of Open Access Journals (Sweden)

    Corinne Laverty

    2008-06-01

    Full Text Available The need for web literacy has become increasingly important with the exponential growth of learning materials on the web that are freely accessible to educators. Teachers need the skills to locate these tools and also the ability to teach their students web search strategies and evaluation of websites so they can effectively explore the web by themselves. This study examined the web searching strategies of 253 teachers-in-training using both a survey (247 participants and live screen capture with think aloud audio recording (6 participants. The results present a picture of the strategic, syntactic, and evaluative search abilities of these students that librarians and faculty can use to plan how instruction can target information skill deficits in university student populations.

  16. Ensemble learned vaccination uptake prediction using web search queries

    DEFF Research Database (Denmark)

    Hansen, Niels Dalum; Lioma, Christina; Mølbak, Kåre

    2016-01-01

    We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. The clinical data is official vaccination registries, and the web data is query frequencies collected from Google Trends. Experiments with official...... vaccine records show that our method predicts vaccination uptake eff?ectively (4.7 Root Mean Squared Error). Whereas performance is best when combining clinical and web data, using solely web data yields comparative performance. To our knowledge, this is the ?first study to predict vaccination uptake...

  17. ONTOLOGY BASED MEANINGFUL SEARCH USING SEMANTIC WEB AND NATURAL LANGUAGE PROCESSING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    K. Palaniammal

    2013-10-01

    Full Text Available The semantic web extends the current World Wide Web by adding facilities for the machine understood description of meaning. The ontology based search model is used to enhance efficiency and accuracy of information retrieval. Ontology is the core technology for the semantic web and this mechanism for representing formal and shared domain descriptions. In this paper, we proposed ontology based meaningful search using semantic web and Natural Language Processing (NLP techniques in the educational domain. First we build the educational ontology then we present the semantic search system. The search model consisting three parts which are embedding spell-check, finding synonyms using WordNet API and querying ontology using SPARQL language. The results are both sensitive to spell check and synonymous context. This paper provides more accurate results and the complete details for the selected field in a single page.

  18. Uncovering the Hidden Web, Part I: Finding What the Search Engines Don't. ERIC Digest.

    Science.gov (United States)

    Mardis, Marcia

    Currently, the World Wide Web contains an estimated 7.4 million sites (OCLC, 2001). Yet even the most experienced searcher, using the most robust search engines, can access only about 16% of these pages (Dahn, 2001). The other 84% of the publicly available information on the Web is referred to as the "hidden,""invisible," or…

  19. Study on Deep Web Search Interface%Deep Web查询接口研究

    Institute of Scientific and Technical Information of China (English)

    钱程; 阳小兰

    2012-01-01

    Deep Web查询接口是Web数据库的接口,其对于Deep Web数据库集成至关重要.本文根据网页表单的结构特征定义查询接口;针对非提交查询法,给出界定Deep Web查询接口的一些规则;提出提交查询法,根据链接属性的特点进行判断,找到包含查询接口的页面;采用决策树C4.5算法进行分类,并用Java语言实现Deep Web查询接口系统.%Deep Web search interface is the interface of Web database. It is essential for the integration of Deep Web databases. According to the structural characteristics of the Web form, search interface is defined. For non-submission query method, some of the rules as defined in the Deep Web search interface are given. The submission query method is proposed, which finds out the page containing the search interface with the features of the link properties. The Web pages are classified with the C4. S decision tree algorithm, and the Deep Web search interface system is realized by using Java.

  20. State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search

    Directory of Open Access Journals (Sweden)

    Kehinde Agbele

    2012-01-01

    Full Text Available People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin’s principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.

  1. Search Engines and Search Technologies for Web-based Text Data%网络文本数据搜索引擎与搜索技术

    Institute of Scientific and Technical Information of China (English)

    李勇

    2001-01-01

    This paper describes the functions, characteristics and operating principles of search engines based on Web text, and the searching and data mining technologies for Web-based text information. Methods of computer-aided text clustering and abstacting are also given. Finally, it gives some guidelines for the assessment of searching quality.

  2. Bringing The Web Down to Size: Advanced Search Techniques.

    Science.gov (United States)

    Huber, Joe; Miley, Donna

    1997-01-01

    Examines advanced Internet search techniques, focusing on six search engines. Includes a chart comparison of nine search features: "include two words,""exclude one of two words,""exclude mature audience content,""two adjacent words,""exact match,""contains first and neither of two following…

  3. Weighted Page Content Rank for Ordering Web Search Result

    Directory of Open Access Journals (Sweden)

    POOJA SHARMA,

    2010-12-01

    Full Text Available With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for user’s to utilize automated tools in order to find, extract, filter and evaluate the desired information and resources. Web structure mining and content mining plays an effective role in this approach. There are two Ranking algorithms PageRank and Weighted PageRank. PageRank is a commonly used algorithm in Web Structure Mining. Weighted Page Rank also takes the importance of the inlinks and outlinks of the pages but the rank score to all links is not equally distributed. i.e. unequal distribution is performed. In this paper we proposed a new algorithm, Weighted Page Content Rank (WPCRbased on web content mining and structure mining that shows the relevancy of the pages to a given query is better determined, as compared to the existing PageRank and Weighted PageRank algorithms.

  4. Index Compression and Efficient Query Processing in Large Web Search Engines

    Science.gov (United States)

    Ding, Shuai

    2013-01-01

    The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…

  5. Case and Relation (CARE based Page Rank Algorithm for Semantic Web Search Engines

    Directory of Open Access Journals (Sweden)

    N. Preethi

    2012-05-01

    Full Text Available Web information retrieval deals with a technique of finding relevant web pages for any given query from a collection of documents. Search engines have become the most helpful tool for obtaining useful information from the Internet. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing data to be reused across application. The proposed architecture use a hybrid methodology named Case and Relation (CARE based Page Rank algorithm which uses past problem solving experience maintained in the case base to form a best matching relations and then use them for generating graphs and spanning forests to assign a relevant score to the pages.

  6. From people to entities new semantic search paradigms for the web

    CERN Document Server

    Demartini, G

    2014-01-01

    The exponential growth of digital information available in companies and on the Web creates the need for search tools that can respond to the most sophisticated information needs. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web documents. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic.In the first part of the book, we propose a model for expert finding based on the well-consolidated vector space model for Information Retrieval and inv

  7. Search Engine Optimization for Flash Best Practices for Using Flash on the Web

    CERN Document Server

    Perkins, Todd

    2009-01-01

    Search Engine Optimization for Flash dispels the myth that Flash-based websites won't show up in a web search by demonstrating exactly what you can do to make your site fully searchable -- no matter how much Flash it contains. You'll learn best practices for using HTML, CSS and JavaScript, as well as SWFObject, for building sites with Flash that will stand tall in search rankings.

  8. A Picture is Worth a Thousand Keywords: Exploring Mobile Image-Based Web Searching

    Directory of Open Access Journals (Sweden)

    Konrad Tollmar

    2008-01-01

    Full Text Available Using images of objects as queries is a new approach to search for information on the Web. Image-based information retrieval goes beyond only matching images, as information in other modalities also can be extracted from data collections using an image search. We have developed a new system that uses images to search for web-based information. This paper has a particular focus on exploring users' experience of general mobile image-based web searches to find what issues and phenomena it contains. This was achieved in a multipart study by creating and letting respondents test prototypes of mobile image-based search systems and collect data using interviews, observations, video observations, and questionnaires. We observed that searching for information based only on visual similarity and without any assistance is sometimes difficult, especially on mobile devices with limited interaction bandwidth. Most of our subjects preferred a search tool that guides the users through the search result based on contextual information, compared to presenting the search result as a plain ranked list.

  9. A Portrait of the Audience for Instruction in Web Searching: Results of a Survey Conducted at Two Canadian Universities.

    Science.gov (United States)

    Tillotson, Joy

    2003-01-01

    Describes a survey that was conducted involving participants in the library instruction program at two Canadian universities in order to describe the characteristics of students receiving instruction in Web searching. Examines criteria for evaluating Web sites, search strategies, use of search engines, and frequency of use. Questionnaire is…

  10. TrackMeNot: Enhancing the privacy of Web Search

    CERN Document Server

    Toubiana, Vincent; Nissenbaum, Helen

    2011-01-01

    Most search engines can potentially infer the preferences and interests of a user based on her history of search queries. While search engines can use these inferences for a variety of tasks, including targeted advertisements, such tasks do impose an serious threat to user privacy. In 2006, after AOL disclosed the search queries of 650,000 users, TrackMeNot was released as a simple browser extension that sought to hide user search preferences in a cloud of queries. The first versions of TrackMeNot, though used extensively in the past three years, was fairly simplistic in design and did not provide any strong privacy guarantees. In this paper, we present the new design and implementation of TrackMeNot, which address many of the limitations of the first release. TrackMeNot addresses two basic problems. First, using a model for characterizing search queries, TrackMeNot provides a mechanism for obfuscating the search preferences of a user from a search engine. Second, TrackMeNot prevents the leakage of informatio...

  11. Building maps to search the web: the method Sewcom

    Directory of Open Access Journals (Sweden)

    Corrado Petrucco

    2002-01-01

    Full Text Available Seeking information on the Internet is becoming a necessity 'at school, at work and in every social sphere. Unfortunately the difficulties' inherent in the use of search engines and the use of unconscious cognitive approaches inefficient limit their effectiveness. It is in this respect presented a method, called SEWCOM that lets you create conceptual maps through interaction with search engines.

  12. Key word placing in Web page body text to increase visibility to search engines

    Directory of Open Access Journals (Sweden)

    W. T. Kritzinger

    2007-11-01

    Full Text Available The growth of the World Wide Web has spawned a wide variety of new information sources, which has also left users with the daunting task of determining which sources are valid. Many users rely on the Web as an information source because of the low cost of information retrieval. It is also claimed that the Web has evolved into a powerful business tool. Examples include highly popular business services such as Amazon.com and Kalahari.net. It is estimated that around 80% of users utilize search engines to locate information on the Internet. This, by implication, places emphasis on the underlying importance of Web pages being listed on search engines indices. Empirical evidence that the placement of key words in certain areas of the body text will have an influence on the Web sites' visibility to search engines could not be found in the literature. The result of two experiments indicated that key words should be concentrated towards the top, and diluted towards the bottom of a Web page to increase visibility. However, care should be taken in terms of key word density, to prevent search engine algorithms from raising the spam alarm.

  13. Construction of Powerful Online Search Expert System Based on Semantic Web

    Directory of Open Access Journals (Sweden)

    Yasser A. Nada

    2013-01-01

    Full Text Available In this paper we intends to build an expert system based on semantic web for online search using XML, to help users to find the desired software, and read about its features and specifications. The expert system saves user's time and effort of web searching or buying software from available libraries. Building online search expert system is ideal for capturing support knowledge to produce interactive on-line systems that provide searching details, situation-specific advice exactly like setting a session with an expert. Any person can access this interactive system from his web browser and get some questions answer in addition to precise advice which was provided by an expert. The system can provide some troubleshooting diagnose, find the right products; … Etc. The proposed system further combines aspects of three research topics (Semantic Web, Expert System and XML. Semantic web Ontology will be considered as a set of directed graphs where each node represents an item and the edges denote a term which is related to another term. Organizations can now optimize their most valuable expert knowledge through powerful interactive Web-enabled knowledge automation expert system. Online sessions emulate a conversation with a human expert asking focused questions and producing customized recommendations and advice. Hence, the main powerful point of the proposed expert system is that the skills of any domain expert will be available to everyone.

  14. Searching the World Wide Web: How To Find the Material You Want on the Multimedia Pages of the Internet.

    Science.gov (United States)

    Turner, Mark

    1997-01-01

    Highlights some popular search engines and presents guidelines on making queries, narrowing a search, using quotation marks and how and when to used advanced searches. Discusses special search tools for World Wide Web and CD-ROM products and homework assistance software. Lists the network locations of five popular search engines. (AEF)

  15. Issues and Challenges of User Intent Discovery (UID during Web Search

    Directory of Open Access Journals (Sweden)

    Wael K. Hanna

    2015-06-01

    Full Text Available There is a need to a small set of words –known as a query– to searching for information. Despite the existence gap between a user‘s information need and the way in which such need is represented. Information retrieval system should be able to analyze a given query and present the appropriate web resources that best meet the user‘s needs. In order to improve the quality of web search results, while increasing the user‘s satisfaction, this paper presents the current work to identify user‘s intent sources and how to understand the user behavior and how to discover the users‘ intentions during the web search. This paper also discusses the social network analysis and the web queries analysis. The objective of this paper is to present the challenges and new research trends in understanding the user behavior and discovering the user intent to improve the quality of search engine results and to search the web quickly and thoroughly.

  16. Adding a Visualization Feature to Web Search Engines: It’s Time

    Energy Technology Data Exchange (ETDEWEB)

    Wong, Pak C.

    2008-11-11

    Since the first world wide web (WWW) search engine quietly entered our lives in 1994, the “information need” behind web searching has rapidly grown into a multi-billion dollar business that dominates the internet landscape, drives e-commerce traffic, propels global economy, and affects the lives of the whole human race. Today’s search engines are faster, smarter, and more powerful than those released just a few years ago. With the vast investment pouring into research and development by leading web technology providers and the intense emotion behind corporate slogans such as “win the web” or “take back the web,” I can’t help but ask why are we still using the very same “text-only” interface that was used 13 years ago to browse our search engine results pages (SERPs)? Why has the SERP interface technology lagged so far behind in the web evolution when the corresponding search technology has advanced so rapidly? In this article I explore some current SERP interface issues, suggest a simple but practical visual-based interface design approach, and argue why a visual approach can be a strong candidate for tomorrow’s SERP interface.

  17. Search Interface Design Using Faceted Indexing for Web Resources.

    Science.gov (United States)

    Devadason, Francis; Intaraksa, Neelawat; Patamawongjariya, Pornprapa; Desai, Kavita

    2001-01-01

    Describes an experimental system designed to organize and provide access to Web documents using a faceted pre-coordinate indexing system based on the Deep Structure Indexing System (DSIS) derived from POPSI (Postulate based Permuted Subject Indexing) of Bhattacharyya, and the facet analysis and chain indexing system of Ranganathan. (AEF)

  18. An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.

    Science.gov (United States)

    Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J

    2002-01-01

    Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.

  19. Context Disambiguation Based Semantic Web Search for Effective Information Retrieval

    National Research Council Canada - National Science Library

    M. Barathi; S. Valli

    2011-01-01

    .... To overcome this problem, some search engines suggest terms that are semantically related to the submitted queries, so that users can choose from the suggestions based on their information needs. Approach...

  20. Keynote Talk: Mining the Web 2.0 for Improved Image Search

    Science.gov (United States)

    Baeza-Yates, Ricardo

    There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself.

  1. Crawling PubMed with web agents for literature search and alerting services

    Directory of Open Access Journals (Sweden)

    Carlos CARVALHAL

    2013-05-01

    Full Text Available In this paper we present ASAP - Automated Search with Agents in PubMed, a web-based service aiming to manage and automate scientific literature search in the PubMed database. The system allows the creation and management of web agents, parameterized thematically and functionally, that crawl the PubMed database autonomously and periodically, aiming to search and retrieve relevant results according the requirements provided by the user. The results, containing the publications list retrieved, are emailed to the agent owner on a weekly basis, during the activity period defined for the web agent. The ASAP service is devoted to help researchers, especially from the field of biomedicine and bioinformatics, in order to increase their productivity, and can be accessed at: http://esa.ipb.pt/~agentes.

  2. Search, Read and Write: An Inquiry into Web Accessibility for People with Dyslexia.

    Science.gov (United States)

    Berget, Gerd; Herstad, Jo; Sandnes, Frode Eika

    2016-01-01

    Universal design in context of digitalisation has become an integrated part of international conventions and national legislations. A goal is to make the Web accessible for people of different genders, ages, backgrounds, cultures and physical, sensory and cognitive abilities. Political demands for universally designed solutions have raised questions about how it is achieved in practice. Developers, designers and legislators have looked towards the Web Content Accessibility Guidelines (WCAG) for answers. WCAG 2.0 has become the de facto standard for universal design on the Web. Some of the guidelines are directed at the general population, while others are targeted at more specific user groups, such as the visually impaired or hearing impaired. Issues related to cognitive impairments such as dyslexia receive less attention, although dyslexia is prevalent in at least 5-10% of the population. Navigation and search are two common ways of using the Web. However, while navigation has received a fair amount of attention, search systems are not explicitly included, although search has become an important part of people's daily routines. This paper discusses WCAG in the context of dyslexia for the Web in general and search user interfaces specifically. Although certain guidelines address topics that affect dyslexia, WCAG does not seem to fully accommodate users with dyslexia.

  3. Search Techniques for the Web of Things: A Taxonomy and Survey

    Directory of Open Access Journals (Sweden)

    Yuchao Zhou

    2016-04-01

    Full Text Available The Web of Things aims to make physical world objects and their data accessible through standard Web technologies to enable intelligent applications and sophisticated data analytics. Due to the amount and heterogeneity of the data, it is challenging to perform data analysis directly; especially when the data is captured from a large number of distributed sources. However, the size and scope of the data can be reduced and narrowed down with search techniques, so that only the most relevant and useful data items are selected according to the application requirements. Search is fundamental to the Web of Things while challenging by nature in this context, e.g., mobility of the objects, opportunistic presence and sensing, continuous data streams with changing spatial and temporal properties, efficient indexing for historical and real time data. The research community has developed numerous techniques and methods to tackle these problems as reported by a large body of literature in the last few years. A comprehensive investigation of the current and past studies is necessary to gain a clear view of the research landscape and to identify promising future directions. This survey reviews the state-of-the-art search methods for the Web of Things, which are classified according to three different viewpoints: basic principles, data/knowledge representation, and contents being searched. Experiences and lessons learned from the existing work and some EU research projects related to Web of Things are discussed, and an outlook to the future research is presented.

  4. A Feature-Weighted Instance-Based Learner for Deep Web Search Interface Identification

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2013-02-01

    Full Text Available Determining whether a site has a search interface is a crucial priority for further research of deep web databases. This study first reviews the current approaches employed in search interface identification for deep web databases. Then, a novel identification scheme using hybrid features and a feature-weighted instance-based learner is put forward. Experiment results show that the proposed scheme is satisfactory in terms of classification accuracy and our feature-weighted instance-based learner gives better results than classical algorithms such as C4.5, random forest and KNN.

  5. Using Exclusive Web Crawlers to Store Better Results in Search Engines' Database

    Directory of Open Access Journals (Sweden)

    Ali Tourani

    2013-05-01

    Full Text Available Crawler-based search engines are the mostly used search engines among web and Internet users , involveweb crawling, storing in database, ranking, indexing and displaying to the user. But it is noteworthy thatbecause of increasing changes in web sites search engines suffer high time and transfers costs which areconsumed to investigate the existence of each page in database while crawling, updating database andeven investigating its existence in any crawling operations."Exclusive Web Crawler" proposes guidelines for crawling features, links, media and other elements and tostore crawling results in a certain table in its database on the web. With doing this, search engines storeeach site's tables in their databases and implement their ranking results on them. Thus, accuracy of data inevery table (and its being up-to-date is ensured and no 404 result is shown in search results since, in fact,this data crawler crawls data entered by webmaster and the database stores whatever he wants to display.

  6. Using the open Web as an information resource and scholarly Web search engines as retrieval tools for academic and research purposes

    Directory of Open Access Journals (Sweden)

    Filistea Naude

    2010-08-01

    Full Text Available This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The results of this study show that academics have indeed accepted the open Web as a useful information resource and Web search engines as retrieval tools when seeking information for academic and research work. The majority of respondents used the open Web and Web search engines on a daily or weekly basis to source academic and research information. The main obstacles presented by using the open Web and Web search engines included lack of time to search and browse the Web, information overload, poor network speed and the slow downloading speed of webpages.

  7. Categorization of web pages - Performance enhancement to search engine

    Digital Repository Service at National Institute of Oceanography (India)

    Lakshminarayana, S.

    are the major areas of research in IR and strive to improve the effectiveness of interactive IR and can be used as performance evaluation tool. The classification studies at early stages were with strong human interaction than machine learning. The term... and the location of the link. In the absence such works, the spider/worm either moves to the next page available at the least time or by network selection. This classification serves in judgment of traversal of web spider/worm and minimization. Such processes...

  8. Web-Scale Search-Based Data Extraction and Integration

    Science.gov (United States)

    2011-10-17

    Garage band / danceteria. Piscina adulto des coberta Sala de leitura, Spa Figure 107: An apartment feature from OLX. The expected neighborhood is...11 I lon.nin - 23 I lon.bem - E I postal.code - 6010-6080 I area.code - 0612 I licence - I I mayor - Hilde Zach I website - (nttp...km’ Elevation 574 m Coordinates ’ •’:3 i * Postal cod* 6010-6060 Area code 05i: Licence plate code i Mayor Hild# Zath Web s-ie • • . . ittf

  9. Towards a Simple and Efficient Web Search Framework

    Science.gov (United States)

    2014-11-01

    any useful information about the various aspects of a topic. For example, for the query “ raspberry pi ”, it covers topics such as “what is raspberry pi ...topics generated by the LDA topic model for query ” raspberry pi ”. One simple explanation is that web texts are too noisy and unfocused for the LDA process...making a rasp- berry pi ”. However, the topics generated based on the 10 top ranked documents do not make much sense to us in terms of their keywords

  10. Query transformations and their role in Web searching by the members of the general public

    Directory of Open Access Journals (Sweden)

    Martin Whittle

    2006-01-01

    Full Text Available Introduction. This paper reports preliminary research in a primarily experimental study of how the general public search for information on the Web. The focus is on the query transformation patterns that characterise searching. Method. In this work, we have used transaction logs from the Excite search engine to develop methods for analysing query transformations that should aid the analysis of our ongoing experimental work. Our methods involve the use of similarity techniques to link queries with the most similar previous query in a train. The resulting query transformations are represented as a list of codes representing a whole search. Analysis. It is shown how query transformation sequences can be represented as graphical networks and some basic statistical results are shown. A correlation analysis is performed to examine the co-occurrence of Boolean and quotation mark changes with the syntactic changes. Results. A frequency analysis of the occurrence of query transformation codes is presented. The connectivity of graphs obtained from the query transformation is investigated and found to follow an exponential scaling law. The correlation analysis reveals a number of patterns that provide some interesting insights into Web searching by the general public. Conclusion. We have developed analytical methods based on query similarity that can be applied to our current experimental work with volunteer subjects. The results of these will form part of a database with the aim of developing an improved understanding of how the public search the Web.

  11. What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.

    Science.gov (United States)

    Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W

    2015-06-01

    Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.

  12. Relevance Preserving Projection and Ranking for Web Image Search Reranking.

    Science.gov (United States)

    Ji, Zhong; Pang, Yanwei; Li, Xuelong

    2015-11-01

    An image search reranking (ISR) technique aims at refining text-based search results by mining images' visual content. Feature extraction and ranking function design are two key steps in ISR. Inspired by the idea of hypersphere in one-class classification, this paper proposes a feature extraction algorithm named hypersphere-based relevance preserving projection (HRPP) and a ranking function called hypersphere-based rank (H-Rank). Specifically, an HRPP is a spectral embedding algorithm to transform an original high-dimensional feature space into an intrinsically low-dimensional hypersphere space by preserving the manifold structure and a relevance relationship among the images. An H-Rank is a simple but effective ranking algorithm to sort the images by their distances to the hypersphere center. Moreover, to capture the user's intent with minimum human interaction, a reversed k-nearest neighbor (KNN) algorithm is proposed, which harvests enough pseudorelevant images by requiring that the user gives only one click on the initially searched images. The HRPP method with reversed KNN is named one-click-based HRPP (OC-HRPP). Finally, an OC-HRPP algorithm and the H-Rank algorithm form a new ISR method, H-reranking. Extensive experimental results on three large real-world data sets show that the proposed algorithms are effective. Moreover, the fact that only one relevant image is required to be labeled makes it has a strong practical significance.

  13. Optimal Threshold Control by the Robots of Web Search Engines with Obsolescence of Documents

    CERN Document Server

    Avrachenkov, Konstantin; Klimenok, Valentina; Nain, Philippe; Semenova, Olga; 10.1016/j.comnet.2011.01.013

    2012-01-01

    A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the crawling engine. The crawling engine finds new web pages and updates web pages existing in the database of the web search engine. The crawling engine has several robots collecting information from the Internet. We first calculate various performance measures of the system (e.g., probability of arbitrary page loss due to the buffer overflow, probability of starvation of the system, the average time waiting in the buffer). Intuitively, we would like to avoid system starvation and at the same time to minimize the information loss. We formulate the problem as a multi-criteria optimization problem and attributing a weight to each criterion. We solve it in the class of threshold policies. We consider a very general web page arrival process modeled by Batch Marked Markov Arrival Process and a very general service time modeled by Phase-type dis...

  14. Work Out the Semantic Web Search: The Cooperative Way

    Directory of Open Access Journals (Sweden)

    Dora Melo

    2012-01-01

    Full Text Available We propose a Cooperative Question Answering System that takes as input natural language queries and is able to return a cooperative answer based on semantic web resources, more specifically DBpedia represented in OWL/RDF as knowledge base and WordNet to build similar questions. Our system resorts to ontologies not only for reasoning but also to find answers and is independent of prior knowledge of the semantic resources by the user. The natural language question is translated into its semantic representation and then answered by consulting the semantics sources of information. The system is able to clarify the problems of ambiguity and helps finding the path to the correct answer. If there are multiple answers to the question posed (or to the similar questions for which DBpedia contains answers, they will be grouped according to their semantic meaning, providing a more cooperative and clarified answer to the user.

  15. The Use of Social Tags in Text and Image Searching on the Web

    Science.gov (United States)

    Kim, Yong-Mi

    2011-01-01

    In recent years, tags have become a standard feature on a diverse range of sites on the Web, accompanying blog posts, photos, videos, and online news stories. Tags are descriptive terms attached to Internet resources. Despite the rapid adoption of tagging, how people use tags during the search process is not well understood. There is little…

  16. Engaging Student Interpreters in Vocabulary Building: Web Search with Computer Workbench

    Science.gov (United States)

    Lim, Lily

    2014-01-01

    This paper investigates the usefulness of Web portals in a workbench for assisting student interpreters in the search for and collection of vocabulary. The experiment involved a class of fifteen English as a Foreign Language (EFL) student interpreters, who were required to equip themselves with the appropriate English vocabulary to handle an…

  17. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    Science.gov (United States)

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  18. Information Retrieval Strategies of Millennial Undergraduate Students in Web and Library Database Searches

    Science.gov (United States)

    Porter, Brandi

    2009-01-01

    Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…

  19. The effects of link format and screen location on visual search of web pages.

    Science.gov (United States)

    Ling, Jonathan; Van Schaik, Paul

    2004-06-22

    Navigation of web pages is of critical importance to the usability of web-based systems such as the World Wide Web and intranets. The primary means of navigation is through the use of hyperlinks. However, few studies have examined the impact of the presentation format of these links on visual search. The present study used a two-factor mixed measures design to investigate whether there was an effect of link format (plain text, underlined, bold, or bold and underlined) upon speed and accuracy of visual search and subjective measures in both the navigation and content areas of web pages. An effect of link format on speed of visual search for both hits and correct rejections was found. This effect was observed in the navigation and the content areas. Link format did not influence accuracy in either screen location. Participants showed highest preference for links that were in bold and underlined, regardless of screen area. These results are discussed in the context of visual search processes and design recommendations are given.

  20. A geospatial search engine for discovering multi-format geospatial data across the web

    Science.gov (United States)

    Christopher Bone; Alan Ager; Ken Bunzel; Lauren Tierney

    2014-01-01

    The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created. However, challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist. The objective of this paper is to present a publically...

  1. Web-Searching to Learn: The Role of Internet Self-Efficacy in Pre-School Educators' Conceptions and Approaches

    Science.gov (United States)

    Kao, Chia-Pin; Chien, Hui-Min

    2017-01-01

    This study was conducted to explore the relationships between pre-school educators' conceptions of and approaches to learning by web-searching through Internet Self-efficacy. Based on data from 242 pre-school educators who had prior experience of participating in web-searching in Taiwan for path analyses, it was found in this study that…

  2. E-Librarian Service Search with Semantic Web Technologies

    CERN Document Server

    Linckels, Serge

    2011-01-01

    This book introduces a new approach to designing E-Librarian Services. With the help of this system, users will be able to retrieve multimedia resources from digital libraries more efficiently than they would by browsing through an index or by using a simple keyword search. E-Librarian Services combine recent advances in multimedia information retrieval with aspects of human-machine interfaces, such as the ability to ask questions in natural language; they simulate a human librarian by finding and delivering the most relevant documents that offer users potential answers to their queries. The p

  3. Reconsidering the Rhizome: A Textual Analysis of Web Search Engines as Gatekeepers of the Internet

    Science.gov (United States)

    Hess, A.

    Critical theorists have often drawn from Deleuze and Guattari's notion of the rhizome when discussing the potential of the Internet. While the Internet may structurally appear as a rhizome, its day-to-day usage by millions via search engines precludes experiencing the random interconnectedness and potential democratizing function. Through a textual analysis of four search engines, I argue that Web searching has grown hierarchies, or "trees," that organize data in tracts of knowledge and place users in marketing niches rather than assist in the development of new knowledge.

  4. JAMSTEC Data Search Portal - A Data Search Service using Web GIS for the Marine-Earth Observation Data

    Science.gov (United States)

    Hanafusa, Y.; Abe, Y.

    2009-12-01

    JAMSTEC has started its marine observations since 1981 using its research vessels and deep sea submersibles and has publicized its observation data and samples on its web sites. JAMSTEC operates several tens of scientific cruises and hundreds of dives a year. And its research activities extend not only in oceanography but also biosphere, solid earth, land area and the atmosphere etc. As numbers of data sites and amounts of data increased, a comprehensive data search service throughout various data sites was thought to be necessary. We chose a web GIS server, ArcIMS of ESRI, for the core component of this system, because the Earth observations are geo-located, most of our users are interested in specific areas and the spatial retrieval function of GIS is highly effective in such case. Data Search Portal retrieves shape files which include the location of observation points/lines and properties including period, variables and data page URLs, etc. When users select data type(s) and refresh a map, distribution of observation points/lines of selected data type(s) is shown on a map and users are able to zoom in/out and pan across the map. After choosing an area of interest using a mouse, users can get lists of observations with links to relevant data publication pages. Combination search by area and other properties is also available. This system is specialized to a portal service using only URLs to make links to the data publication pages. Data and detailed information are available on relevant pages. A metadata set in the Data Search Portal is in a JAMSTEC original, which includes positions, periods, research vessel names, cruise IDs, variables, URLs of publication page as essential properties and some optional properties, for example principal investigator’s names of cruises, station names, research area names etc. Also ArcIMS enables us to have plural interfaces on one server and to provide a special front-end search function for specific data type. Using this

  5. Wild Card Queries for Searching Resources on the Web

    CERN Document Server

    Rafiei, Davood

    2009-01-01

    We propose a domain-independent framework for searching and retrieving facts and relationships within natural language text sources. In this framework, an extraction task over a text collection is expressed as a query that combines text fragments with wild cards, and the query result is a set of facts in the form of unary, binary and general $n$-ary tuples. A significance of our querying mechanism is that, despite being both simple and declarative, it can be applied to a wide range of extraction tasks. A problem in querying natural language text though is that a user-specified query may not retrieve enough exact matches. Unlike term queries which can be relaxed by removing some of the terms (as is done in search engines), removing terms from a wild card query without ruining its meaning is more challenging. Also, any query expansion has the potential to introduce false positives. In this paper, we address the problem of query expansion, and also analyze a few ranking alternatives to score the results and to r...

  6. WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTERN RETRIEVAL ALGORITHM

    Directory of Open Access Journals (Sweden)

    Pushpa C N

    2013-02-01

    Full Text Available Semantic Similarity measures plays an important role in information retrieval, natural language processing and various tasks on web such as relation extraction, community mining, document clustering, and automatic meta-data extraction. In this paper, we have proposed a Pattern Retrieval Algorithm [PRA] to compute the semantic similarity measure between the words by combining both page count method and web snippets method. Four association measures are used to find semantic similarity between words in page count method using web search engines. We use a Sequential Minimal Optimization (SMO support vector machines (SVM to find the optimal combination of page counts-based similarity scores and top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous word-pairs and nonsynonymous word-pairs. The proposed approach aims to improve the Correlation values, Precision, Recall, and F-measures, compared to the existing methods. The proposed algorithm outperforms by 89.8 % of correlation value.

  7. Dropout Rates and Response Times of an Occupation Search Tree in a Web Survey

    Directory of Open Access Journals (Sweden)

    Tijdens Kea

    2014-03-01

    Full Text Available Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database. They are hardly ever used, but this may change due to technical advancements. This article evaluates a three-step search tree with 1,700 occupational titles, used in the 2010 multilingual WageIndicator web survey for UK, Belgium and Netherlands (22,990 observations. Dropout rates are high; in Step 1 due to unemployed respondents judging the question not to be adequate, and in Step 3 due to search tree item length. Median response times are substantial due to search tree item length, dropout in the next step and invalid occupations ticked. Overall the validity of the occupation data is rather good, 1.7-7.5% of the respondents completing the search tree have ticked an invalid occupation.

  8. Googling social interactions: web search engine based social network construction.

    Science.gov (United States)

    Lee, Sang Hoon; Kim, Pan-Jun; Ahn, Yong-Yeol; Jeong, Hawoong

    2010-07-21

    Social network analysis has long been an untiring topic of sociology. However, until the era of information technology, the availability of data, mainly collected by the traditional method of personal survey, was highly limited and prevented large-scale analysis. Recently, the exploding amount of automatically generated data has completely changed the pattern of research. For instance, the enormous amount of data from so-called high-throughput biological experiments has introduced a systematic or network viewpoint to traditional biology. Then, is "high-throughput" sociological data generation possible? Google, which has become one of the most influential symbols of the new Internet paradigm within the last ten years, might provide torrents of data sources for such study in this (now and forthcoming) digital era. We investigate social networks between people by extracting information on the Web and introduce new tools of analysis of such networks in the context of statistical physics of complex systems or socio-physics. As a concrete and illustrative example, the members of the 109th United States Senate are analyzed and it is demonstrated that the methods of construction and analysis are applicable to various other weighted networks.

  9. Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach

    Directory of Open Access Journals (Sweden)

    Hong Wang

    2014-12-01

    Full Text Available To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML form or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are scarce, hard to get and requires tediousmanual work, while unlabeled HTML forms are abundant and easy to obtain. In this research, we consider the plausibility of using both labeled and unlabeled data to train better models to identify search interfaces more effectively. We present a semi-supervised co-training ensemble learning approach using both neural networks and decision trees to deal with the search interface identification problem. We show that the proposed model outperforms previous methods using only labeled data. We also show that adding unlabeled data improves the effectiveness of the proposed model.

  10. New Architectures for Presenting Search Results Based on Web Search Engines Users Experience

    Science.gov (United States)

    Martinez, F. J.; Pastor, J. A.; Rodriguez, J. V.; Lopez, Rosana; Rodriguez, J. V., Jr.

    2011-01-01

    Introduction: The Internet is a dynamic environment which is continuously being updated. Search engines have been, currently are and in all probability will continue to be the most popular systems in this information cosmos. Method: In this work, special attention has been paid to the series of changes made to search engines up to this point,…

  11. Spatial Search Techniques for Mobile 3D Queries in Sensor Web Environments

    Directory of Open Access Journals (Sweden)

    James D. Carswell

    2013-03-01

    Full Text Available Developing mobile geo-information systems for sensor web applications involves technologies that can access linked geographical and semantically related Internet information. Additionally, in tomorrow’s Web 4.0 world, it is envisioned that trillions of inexpensive micro-sensors placed throughout the environment will also become available for discovery based on their unique geo-referenced IP address. Exploring these enormous volumes of disparate heterogeneous data on today’s location and orientation aware smartphones requires context-aware smart applications and services that can deal with “information overload”. 3DQ (Three Dimensional Query is our novel mobile spatial interaction (MSI prototype that acts as a next-generation base for human interaction within such geospatial sensor web environments/urban landscapes. It filters information using “Hidden Query Removal” functionality that intelligently refines the search space by calculating the geometry of a three dimensional visibility shape (Vista space at a user’s current location. This 3D shape then becomes the query “window” in a spatial database for retrieving information on only those objects visible within a user’s actual 3D field-of-view. 3DQ reduces information overload and serves to heighten situation awareness on constrained commercial off-the-shelf devices by providing visibility space searching as a mobile web service. The effects of variations in mobile spatial search techniques in terms of query speed vs. accuracy are evaluated and presented in this paper.

  12. 移动Web搜索研究%Research on Mobile Web Search

    Institute of Scientific and Technical Information of China (English)

    张金增; 孟小峰

    2012-01-01

    With the coming of 3G ages and the high-speed growth of Web resources, there is a trend of rapid development in mobile Internet, making resources accessesible and information obtainable conveniently by using mobile devices. However, in mobile Web search, these are very challenging tasks to geo-tag Web resources, integrated spatial data and Web data seamlessly, and provide valuable and high-relevant information to users. A framework of mobile Web search is proposed in this paper, and the key research techniques in mobile Web search are classified and surveyed according to this framework. Based on the comprehensive comparison and analysis of existing techniques, the suggestions for future research are put forward.%随着3G时代的到来和Web资源的飞速增长,移动互联网呈现出快速发展的趋势,人们可以利用移动终端设备便捷地访问网络,从中获取丰富的信息.然而如何对Web资源进行地理标记,并将地理数据与Web数据进行无缝集成,为移动用户提供有价值的高度相关的信息,却都是十分具有挑战性的工作.提出了一个移动Web搜索的系统框架,依据该框架对移动Web搜索领域关键性技术进行了分类概括总结.在对已有技术进行深入对比分析的基础上,指出了该领域未来的研究工作和面临的挑战.

  13. THUIR at TREC 2009 Web Track: Finding Relevant and Diverse Results for Large Scale Web Search

    Science.gov (United States)

    2009-11-01

    Porn words‟ filtering is also one of the anti-spam techniques in real world search engines. A list of porn words was found from the internet [2...When the numbers of the porn words in the page is larger than α, then the page is taken as the spam. In our experiments, the threshold is set to 16

  14. Dual-Task Performance as a Measure of Mental Effort in Searching a Library System and the Web

    OpenAIRE

    Kim, Yong-Mi; Rieh, Soo Young

    2005-01-01

    This paper examines a dual-task method for the assessment of mental effort during online searching, having the users engage in two tasks simultaneously. Searching was assigned as a primary task and a visual observation was set up as a secondary task. The study participants were asked to perform two searches, one on the Web and the other in a webbased library system. Perceived search difficulty and mental effort for searching on the two types of systems were compared through participa...

  15. Efficient Top-k Locality Search for Co-located Spatial Web Objects

    DEFF Research Database (Denmark)

    Qu, Qiang; Liu, Siyuan; Yang, Bin

    2014-01-01

    In step with the web being used widely by mobile users, user location is becoming an essential signal in services, including local intent search. Given a large set of spatial web objects consisting of a geographical location and a textual description (e.g., online business directory entries...... of restaurants, bars, and shops), how can we find sets of objects that are both spatially and textually relevant to a query? Most of existing studies solve the problem by requiring that all query keywords are covered by the returned objects and then rank the sets by spatial proximity. The needs for identifying...

  16. A Novel Framework for Medical Web Information Foraging Using Hybrid ACO and Tabu Search.

    Science.gov (United States)

    Drias, Yassine; Kechid, Samir; Pasi, Gabriella

    2016-01-01

    We present in this paper a novel approach based on multi-agent technology for Web information foraging. We proposed for this purpose an architecture in which we distinguish two important phases. The first one is a learning process for localizing the most relevant pages that might interest the user. This is performed on a fixed instance of the Web. The second takes into account the openness and dynamicity of the Web. It consists on an incremental learning starting from the result of the first phase and reshaping the outcomes taking into account the changes that undergoes the Web. The system was implemented using a colony of artificial ants hybridized with tabu search in order to achieve more effectiveness and efficiency. To validate our proposal, experiments were conducted on MedlinePlus, a real website dedicated for research in the domain of Health in contrast to other previous works where experiments were performed on web logs datasets. The main results are promising either for those related to strong Web regularities and for the response time, which is very short and hence complies the real time constraint.

  17. Optimum Design of Composite Corrugated Web Beams Using Hunting Search Algorithm

    Directory of Open Access Journals (Sweden)

    Ferhat Erdal

    2017-07-01

    Full Text Available Over the past few years there has been sustainable development in the steel and composite construction technology. One of the recent additions to such developments is the I-girders with corrugated web beams. The use of these new generation beams results in a range of benefits, including flexible, free internal spaces and reduced foundation costs. Corrugated web beams are built-up girders with a thin-walled, corrugated web and wide plate flanges. The thin corrugated web affords a significant weight reduction of these beams, compared with hot-rolled or welded ones. In this paper, optimum design of corrugated composite beams is presented. A recent stochastic optimization algorithm coded that is based on hunting search is used for obtaining the solution of the design problem. In the optimisation process, besides the thickness of concrete slab and studs, web height and thickness, distance between the peaks of the two curves, the width and thickness of flange are considered as design variables. The design constraints are respectively implemented from BS EN1993-1:2005 (Annex-D, Eurocode 3 BS-8110 and DIN 18-800 Teil-1. Furthermore, these selections are also carried out such that the design limitations are satisfied and the weight of the composite corrugated web beam is the minimum.

  18. Web-based Image Search Engines%因特网上的图像搜索引擎

    Institute of Scientific and Technical Information of China (English)

    陈立娜

    2001-01-01

    The operating principle of Web-based image search engines is briefly described. A detailed evaluation of some of image search engines is made. Finally, the paper points out the deficiencies of the present image search engines and their development trend.

  19. AN EFFICIENT APPROACH FOR KEYWORD SELECTION; IMPROVING ACCESSIBILITY OF WEB CONTENTS BY GENERAL SEARCH ENGINES

    Directory of Open Access Journals (Sweden)

    H. H. Kian

    2011-11-01

    Full Text Available General search engines often provide low precise results even for detailed queries. So there is a vital needto elicit useful information like keywords for search engines to provide acceptable results for user’s searchqueries. Although many methods have been proposed to show how to extract keywords automatically, allattempt to get a better recall, precision and other criteria which describe how the method has done its jobas an author. This paper presents a new automatic keyword extraction method which improves accessibilityof web content by search engines. The proposed method defines some coefficients determining featuresefficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidatekeywords by a function that utilizes the result of search engines. When comparing to the other methods,experiments demonstrate that by using the proposed method, a higher score is achieved from searchengines without losing noticeable recall or precision.

  20. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  1. GeNemo: a search engine for web-based functional genomic data

    OpenAIRE

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-01-01

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of E...

  2. Web search engine:characteristics of user behaviors and their implication

    Institute of Scientific and Technical Information of China (English)

    王建勇; 单松巍; 雷鸣; 谢正茂; 李晓明

    2001-01-01

    In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user's queries and clicked URLs present dramatic locality, which implies that query cache and ‘hot click'cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution characteristics of web information are also analyzed, which demonstrates that the link popularity and replica popularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.

  3. Enhanced Trustworthy and High-Quality Information Retrieval System for Web Search Engines

    CERN Document Server

    Ramachandran, S; Joseph, S; Ramaraj, V

    2009-01-01

    The WWW is the most important source of information. But, there is no guarantee for information correctness and lots of conflicting information is retrieved by the search engines and the quality of provided information also varies from low quality to high quality. We provide enhanced trustworthiness in both specific (entity) and broad (content) queries in web searching. The filtering of trustworthiness is based on 5 factors: Provenance, Authority, Age, Popularity, and Related Links. The trustworthiness is calculated based on these 5 factors and it is stored thereby increasing the performance in retrieving trustworthy websites. The calculated trustworthiness is stored only for static websites. Quality is provided based on policies selected by the user. Quality based ranking of retrieved trusted information is provided using WIQA (Web Information Quality Assessment) Framework.

  4. SLIM: an alternative Web interface for MEDLINE/PubMed searches – a preliminary study

    Directory of Open Access Journals (Sweden)

    Ackerman Michael

    2005-12-01

    Full Text Available Abstract Background With the rapid growth of medical information and the pervasiveness of the Internet, online search and retrieval systems have become indispensable tools in medicine. The progress of Web technologies can provide expert searching capabilities to non-expert information seekers. The objective of the project is to create an alternative search interface for MEDLINE/PubMed searches using JavaScript slider bars. SLIM, or Slider Interface for MEDLINE/PubMed searches, was developed with PHP and JavaScript. Interactive slider bars in the search form controlled search parameters such as limits, filters and MeSH terminologies. Connections to PubMed were done using the Entrez Programming Utilities (E-Utilities. Custom scripts were created to mimic the automatic term mapping process of Entrez. Page generation times for both local and remote connections were recorded. Results Alpha testing by developers showed SLIM to be functionally stable. Page generation times to simulate loading times were recorded the first week of alpha and beta testing. Average page generation times for the index page, previews and searches were 2.94 milliseconds, 0.63 seconds and 3.84 seconds, respectively. Eighteen physicians from the US, Australia and the Philippines participated in the beta testing and provided feedback through an online survey. Most users found the search interface user-friendly and easy to use. Information on MeSH terms and the ability to instantly hide and display abstracts were identified as distinctive features. Conclusion SLIM can be an interactive time-saving tool for online medical literature research that improves user control and capability to instantly refine and refocus search strategies. With continued development and by integrating search limits, methodology filters, MeSH terms and levels of evidence, SLIM may be useful in the practice of evidence-based medicine.

  5. Entity Came to Rescue - Leveraging Entities to Minimize Risks in Web Search

    Science.gov (United States)

    2014-11-01

    Entity Came to Rescue - Leveraging Entities to Minimize Risks in Web Search Xitong Liu, Peilin Yang and Hui Fang University of Delaware , Newark, DE...6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of Delaware ...by ANSI Std Z39-18 The American Revolutionary War (1775–1783), also known as the American War of Independence, was the military component of the

  6. Finding Business Information on the "Invisible Web": Search Utilities vs. Conventional Search Engines.

    Science.gov (United States)

    Darrah, Brenda

    Researchers for small businesses, which may have no access to expensive databases or market research reports, must often rely on information found on the Internet, which can be difficult to find. Although current conventional Internet search engines are now able to index over on billion documents, there are many more documents existing in…

  7. Age differences in search of web pages: the effects of link size, link number, and clutter.

    Science.gov (United States)

    Grahame, Michael; Laberge, Jason; Scialfa, Charles T

    2004-01-01

    Reaction time, eye movements, and errors were measured during visual search of Web pages to determine age-related differences in performance as a function of link size, link number, link location, and clutter. Participants (15 young adults, M = 23 years; 14 older adults, M = 57 years) searched Web pages for target links that varied from trial to trial. During one half of the trials, links were enlarged from 10-point to 12-point font. Target location was distributed among the left, center, and bottom portions of the screen. Clutter was manipulated according to the percentage of used space, including graphics and text, and the number of potentially distracting nontarget links was varied. Increased link size improved performance, whereas increased clutter and links hampered search, especially for older adults. Results also showed that links located in the left region of the page were found most easily. Actual or potential applications of this research include Web site design to increase usability, particularly for older adults.

  8. A unified architecture for biomedical search engines based on semantic web technologies.

    Science.gov (United States)

    Jalali, Vahid; Matash Borujerdi, Mohammad Reza

    2011-04-01

    There is a huge growth in the volume of published biomedical research in recent years. Many medical search engines are designed and developed to address the over growing information needs of biomedical experts and curators. Significant progress has been made in utilizing the knowledge embedded in medical ontologies and controlled vocabularies to assist these engines. However, the lack of common architecture for utilized ontologies and overall retrieval process, hampers evaluating different search engines and interoperability between them under unified conditions. In this paper, a unified architecture for medical search engines is introduced. Proposed model contains standard schemas declared in semantic web languages for ontologies and documents used by search engines. Unified models for annotation and retrieval processes are other parts of introduced architecture. A sample search engine is also designed and implemented based on the proposed architecture in this paper. The search engine is evaluated using two test collections and results are reported in terms of precision vs. recall and mean average precision for different approaches used by this search engine.

  9. 语义Web服务搜索研究概述%Overview of researches on the semantic Web service search

    Institute of Scientific and Technical Information of China (English)

    郭富禄; 曾志浩; 武岫缘

    2013-01-01

    Semantic Web services search mainly includes three aspects, namely the service resource index, search conditions, the expression and service resources matching, sorting. The paper summarized and analyzed the research of the three research directions in semantic Web service by the current research field, and prospected the future research work of the semantic Web service search.%收集语义 Web 服务搜索研究的3个主要内容:服务资源的索引、搜索条件的表达和服务资源的匹配、排序方面的相关工作,对当前研究工作进行了分类归纳,最后归纳语义 Web 服务搜索所面临的挑战。

  10. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches

    DEFF Research Database (Denmark)

    Svenstrup, Dan Tito; Jørgensen, Henrik L; Winther, Ole

    2015-01-01

    on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description...... in technology and access to high quality data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise....

  11. Measuring cognitive processes involved in the web search: log files, eye-movements and cued rertospective reports

    NARCIS (Netherlands)

    Argelagos, Esther; Jarodzka, Halszka; Pifarre, Manoli

    2011-01-01

    Argelagós, E., Jarodzka, H., & Pifarré, M. (2011, August). Measuring cognitive processes involved in web search: log files, eye-movements and cued retrospective reports compared. Presentation at EARLI, Exeter, UK.

  12. An Evidence-Based Review of Academic Web Search Engines, 2014-2016: Implications for Librarians' Practice and Research Agenda

    National Research Council Canada - National Science Library

    Jody Condit Fagan

    2017-01-01

    Academic web search engines have become central to scholarly research. While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention...

  13. Is Internet search better than structured instruction for web-based health education?

    Science.gov (United States)

    Finkelstein, Joseph; Bedra, McKenzie

    2013-01-01

    Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.

  14. Efficient Term Extraction and Indexing Approach in Small-Scale Web Search of Uyghur Language

    Directory of Open Access Journals (Sweden)

    Turdi Tohti

    2013-10-01

    Full Text Available In order to avoid the frequently read-write of hard disk and to speed up the search, the index should be saving in the memory in the small-scale web search. But, to express the original information by fewer memory spaces, also needs for index compression, and this would increases the computation expenses or brings certain harm to the original information in a way. In this research of Uyghur small-scale web search, in order to speed up the retrieval and query speed, inverted index has established uses Hash table data structure and entirely stay resident in memory. In the aspect of index compression, have not uses any compression technique, but proposed a word grouping approach based on simplified N-gram statistical model ,and extracting semantic words that structurally stable, semantically complete and independent ,and greatly reduces the scale of indexing item list. Thereby, not only served the purpose of index compression, but also solved the ambiguity problem certain extent and improved the search precision obviously. The experimental result indicated that, our method is feasible and effective

  15. Distributed Web-Scale Infrastructure For Crawling, Indexing And Search With Semantic Support

    Directory of Open Access Journals (Sweden)

    Stefan Dlugolinsky

    2012-01-01

    Full Text Available In this paper, we describe our work in progress in the scope of web-scale informationextraction and information retrieval utilizing distributed computing. Wepresent a distributed architecture built on top of the MapReduce paradigm forinformation retrieval, information processing and intelligent search supportedby spatial capabilities. Proposed architecture is focused on crawling documentsin several different formats, information extraction, lightweight semantic annotationof the extracted information, indexing of extracted information andfinally on indexing of documents based on the geo-spatial information foundin a document. We demonstrate the architecture on two use cases, where thefirst is search in job offers retrieved from the LinkedIn portal and the second issearch in BBC news feeds and discuss several problems we had to face duringthe implementation. We also discuss spatial search applications for both casesbecause both LinkedIn job offer pages and BBC news feeds contain a lot of spatialinformation to extract and process.

  16. Searching the Web for Earth Science Data: Semiotics to Cybernetics and Back

    Directory of Open Access Journals (Sweden)

    Bruce R. Barkstrom

    2016-06-01

    Full Text Available This paper discusses a search paradigm for numerical data in Earth science that relies on the intrinsic structure of an archive's collection. Such non-textual data lies outside the normal textual basis for the Semantic Web. The paradigm tries to bypass some of the difficulties associated with keyword searches, such as semantic heterogeneity. The suggested collection structure uses a hierarchical taxonomy based on multidimensional axes of continuous variables. This structure fits the underlying 'geometry' of Earth science data better than sets of keywords in an ontology. The alternative paradigm views the search as a two-agent cooperative game that uses a dialog between the search engine and the data user. In this view, the search engine knows about the objects in the archive. It cannot read the user's mind to identify what the user needs. We assume the user has a clear idea of the search target. However he or she may not have a clear idea of the archive's contents. The paper suggests how the user interface may provide information to deal with the user's difficulties in understanding items in the dialog.

  17. GeNemo: a search engine for web-based functional genomic data.

    Science.gov (United States)

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org.

  18. Stability-mutation feature identification of Web search keywords based on keyword concentration change ratio

    Institute of Scientific and Technical Information of China (English)

    Hongtao; LU; Guanghui; YE; Gang; LI

    2014-01-01

    Purpose: The aim of this paper is to discuss how the keyword concentration change ratio(KCCR) is used while identifying the stability-mutation feature of Web search keywords during information analyses and predictions.Design/methodology/approach: By introducing the stability-mutation feature of keywords and its significance, the paper describes the function of the KCCR in identifying keyword stability-mutation features. By using Ginsberg’s influenza keywords, the paper shows how the KCCR can be used to identify the keyword stability-mutation feature effectively.Findings: Keyword concentration ratio has close positive correlation with the change rate of research objects retrieved by users, so from the characteristic of the "stability-mutation" of keywords, we can understand the relationship between these keywords and certain information. In general, keywords representing for mutation fit for the objects changing in short-term, while those representing for stability are suitable for long-term changing objects. Research limitations: It is difficult to acquire the frequency of keywords, so indexes or parameters which are closely related to the true search volume are chosen for this study.Practical implications: The stability-mutation feature identification of Web search keywords can be applied to predict and analyze the information of unknown public events through observing trends of keyword concentration ratio.Originality/value: The stability-mutation feature of Web search could be quantitatively described by the keyword concentration change ratio(KCCR). Through KCCR, the authors took advantage of Ginsberg’s influenza epidemic data accordingly and demonstrated how accurate and effective the method proposed in this paper was while it was used in information analyses and predictions.

  19. Using the open Web as an information resource and scholarly Web search engines as retrieval tools for academic and research purposes

    OpenAIRE

    Filistea Naude; Chris Rensleigh; Adeline S.A. du Toit

    2010-01-01

    This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa) was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The re...

  20. Ontology-Driven Search and Triage: Design of a Web-Based Visual Interface for MEDLINE.

    Science.gov (United States)

    Demelo, Jonathan; Parsons, Paul; Sedig, Kamran

    2017-02-02

    Diverse users need to search health and medical literature to satisfy open-ended goals such as making evidence-based decisions and updating their knowledge. However, doing so is challenging due to at least two major difficulties: (1) articulating information needs using accurate vocabulary and (2) dealing with large document sets returned from searches. Common search interfaces such as PubMed do not provide adequate support for exploratory search tasks. Our objective was to improve support for exploratory search tasks by combining two strategies in the design of an interactive visual interface by (1) using a formal ontology to help users build domain-specific knowledge and vocabulary and (2) providing multi-stage triaging support to help mitigate the information overload problem. We developed a Web-based tool, Ontology-Driven Visual Search and Triage Interface for MEDLINE (OVERT-MED), to test our design ideas. We implemented a custom searchable index of MEDLINE, which comprises approximately 25 million document citations. We chose a popular biomedical ontology, the Human Phenotype Ontology (HPO), to test our solution to the vocabulary problem. We implemented multistage triaging support in OVERT-MED, with the aid of interactive visualization techniques, to help users deal with large document sets returned from searches. Formative evaluation suggests that the design features in OVERT-MED are helpful in addressing the two major difficulties described above. Using a formal ontology seems to help users articulate their information needs with more accurate vocabulary. In addition, multistage triaging combined with interactive visualizations shows promise in mitigating the information overload problem. Our strategies appear to be valuable in addressing the two major problems in exploratory search. Although we tested OVERT-MED with a particular ontology and document collection, we anticipate that our strategies can be transferred successfully to other contexts.

  1. Astrobrowse: Using GLU and Other Public Protocols to Build an Astronomy Web Search Agent

    Science.gov (United States)

    McGlynn, T. A.; White, N. E.; Fernique, P.; Wenger, M.; Ochsenbein, F.

    1997-12-01

    We have developed Astrobrowse, a Web search engine where a user may submit a target or position query and have the system search hundreds of possible sites for matching information. Astrobrowse has been developed at NASA's High Energy Astronomy Science Archive Research Center, and uses the GLU syntax and protocols developed at the Centre de Donnees Astronomiques de Strasbourg. Astrobrowse takes the information supplied by a user and then using a GLU database (see Fernique, et al., this conference) generates a query appropriate for each of the resources the user is interested in. Astrobrowse is available at the URL: http://heasarc.gsfc.nasa.gov/ab. Astrobrowse acts essentially as a user agent. Although the user may have requested information from sites around the world, results are cached locally so that the user can quickly move among the various responses. Users can initially select sites based upon matches to text searches (a la Altavista and other Web search engines) or by searching a hierarchical tree of resources. GLU helps Astrobrowse to maintain up-to-date pointers to all sites and allows easy distribution of this list to any other site which might wish to bring up their own Astrobrowse agent. GLU also provides facilities which allow Astrobrowse to access the most responsive resource in cases where a resource is mirrored at multiple sites. All software used within our Astrobrowse installation is freely available for any purpose and we encourage other groups to develop their own customized agents. Links to information and software are available at the Astrobrowse home page. There is also a form available to suggest new pages to be added to the HEASARC Astrobrowse agent. Astrobrowse can be used independently of GLU, in which case updating of the site list must be done completely manually. Please send comments or suggestions for Astrobrowse to tam@silk.gsfc.nasa.gov.

  2. The Strategies WDK: a graphical search interface and web development kit for functional genomics databases.

    Science.gov (United States)

    Fischer, Steve; Aurrecoechea, Cristina; Brunk, Brian P; Gao, Xin; Harb, Omar S; Kraemer, Eileen T; Pennington, Cary; Treatman, Charles; Kissinger, Jessica C; Roos, David S; Stoeckert, Christian J

    2011-01-01

    Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy's flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at code.google.com/p/strategies-wdk. Database URL: www.eupathdb.org.

  3. Identifying Evidence for Public Health Guidance: A Comparison of Citation Searching with Web of Science and Google Scholar

    Science.gov (United States)

    Levay, Paul; Ainsworth, Nicola; Kettle, Rachel; Morgan, Antony

    2016-01-01

    Aim: To examine how effectively forwards citation searching with Web of Science (WOS) or Google Scholar (GS) identified evidence to support public health guidance published by the National Institute for Health and Care Excellence. Method: Forwards citation searching was performed using GS on a base set of 46 publications and replicated using WOS.…

  4. Semantic similarity measures in the biomedical domain by leveraging a web search engine.

    Science.gov (United States)

    Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching

    2013-07-01

    Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.

  5. Web Content Search and Adaptation for IDTV: One Step Forward in the Mediamorphosis Process toward Personal-TV

    Directory of Open Access Journals (Sweden)

    Stefano Ferretti

    2007-01-01

    Full Text Available We are on the threshold of a mediamorphosis that will revolutionize the way we interact with our TV sets. The combination between interactive digital TV (IDTV and the Web fosters the development of new interactive multimedia services enjoyable even through a TV screen and a remote control. Yet, several design constraints complicate the deployment of this new pattern of services. Prominent unresolved issues involve macro-problems such as collecting information on the Web based on users' preferences and appropriately presenting retrieved Web contents on the TV screen. To this aim, we propose a system able to dynamically convey contents from the Web to IDTV systems. Our system presents solutions both for personalized Web content search and automatic TV-format adaptation of retrieved documents. As we demonstrate through two case study applications, our system merges the best of IDTV and Web domains spinning the TV mediamorphosis toward the creation of the personal-TV concept.

  6. An overview of biomedical literature search on the World Wide Web in the third millennium.

    Science.gov (United States)

    Kumar, Prince; Goel, Roshni; Jain, Chandni; Kumar, Ashish; Parashar, Abhishek; Gond, Ajay Ratan

    2012-06-01

    Complete access to the existing pool of biomedical literature and the ability to "hit" upon the exact information of the relevant specialty are becoming essential elements of academic and clinical expertise. With the rapid expansion of the literature database, it is almost impossible to keep up to date with every innovation. Using the Internet, however, most people can freely access this literature at any time, from almost anywhere. This paper highlights the use of the Internet in obtaining valuable biomedical research information, which is mostly available from journals, databases, textbooks and e-journals in the form of web pages, text materials, images, and so on. The authors present an overview of web-based resources for biomedical researchers, providing information about Internet search engines (e.g., Google), web-based bibliographic databases (e.g., PubMed, IndMed) and how to use them, and other online biomedical resources that can assist clinicians in reaching well-informed clinical decisions.

  7. iSARST: an integrated SARST web server for rapid protein structural similarity searches.

    Science.gov (United States)

    Lo, Wei-Cheng; Lee, Che-Yu; Lee, Chi-Ching; Lyu, Ping-Chiang

    2009-07-01

    iSARST is a web server for efficient protein structural similarity searches. It is a multi-processor, batch-processing and integrated implementation of several structural comparison tools and two database searching methods: SARST for common structural homologs and CPSARST for homologs with circular permutations. iSARST allows users submitting multiple PDB/SCOP entry IDs or an archive file containing many structures. After scanning the target database using SARST/CPSARST, the ordering of hits are refined with conventional structure alignment tools such as FAST, TM-align and SAMO, which are run in a PC cluster. In this way, iSARST achieves a high running speed while preserving the high precision of refinement engines. The final outputs include tables listing co-linear or circularly permuted homologs of the query proteins and a functional summary of the best hits. Superimposed structures can be examined through an interactive and informative visualization tool. iSARST provides the first batch mode structural comparison web service for both co-linear homologs and circular permutants. It can serve as a rapid annotation system for functionally unknown or hypothetical proteins, which are increasing rapidly in this post-genomics era. The server can be accessed at http://sarst.life.nthu.edu.tw/iSARST/.

  8. Exploring Multidisciplinary Data Sets through Database Driven Search Capabilities and Map-Based Web Services

    Science.gov (United States)

    O'Hara, S.; Ferrini, V.; Arko, R.; Carbotte, S. M.; Leung, A.; Bonczkowski, J.; Goodwillie, A.; Ryan, W. B.; Melkonian, A. K.

    2008-12-01

    Relational databases containing geospatially referenced data enable the construction of robust data access pathways that can be customized to suit the needs of a diverse user community. Web-based search capabilities driven by radio buttons and pull-down menus can be generated on-the-fly leveraging the power of the relational database and providing specialists a means of discovering specific data and data sets. While these data access pathways are sufficient for many scientists, map-based data exploration can also be an effective means of data discovery and integration by allowing users to rapidly assess the spatial co- registration of several data types. We present a summary of data access tools currently provided by the Marine Geoscience Data System (www.marine-geo.org) that are intended to serve a diverse community of users and promote data integration. Basic search capabilities allow users to discover data based on data type, device type, geographic region, research program, expedition parameters, personnel and references. In addition, web services are used to create database driven map interfaces that provide live access to metadata and data files.

  9. Characterizing interdisciplinarity of researchers and research topics using web search engines.

    Directory of Open Access Journals (Sweden)

    Hiroki Sayama

    Full Text Available Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher and at the social level (his/her centrality in the researchers' network. We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.

  10. Characterizing interdisciplinarity of researchers and research topics using web search engines.

    Science.gov (United States)

    Sayama, Hiroki; Akaishi, Jin

    2012-01-01

    Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher) and at the social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.

  11. A Webometric Analysis of ISI Medical Journals Using Yahoo, AltaVista, and All the Web Search Engines

    Directory of Open Access Journals (Sweden)

    Zohreh Zahedi

    2010-12-01

    Full Text Available The World Wide Web is an important information source for scholarly communications. Examining the inlinks via webometrics studies has attracted particular interests among information researchers. In this study, the number of inlinks to 69 ISI medical journals retrieved by Yahoo, AltaVista, and All The web Search Engines were examined via a comparative and Webometrics study. For data analysis, SPSS software was employed. Findings revealed that British Medical Journal website attracted the most links of all in the three search engines. There is a significant correlation between the number of External links and the ISI impact factor. The most significant correlation in the three search engines exists between external links of Yahoo and AltaVista (100% and the least correlation is found between external links of All The web & the number of pages of AltaVista (0.51. There is no significant difference between the internal links & the number of pages found by the three search engines. But in case of impact factors, significant differences are found between these three search engines. So, the study shows that journals with higher impact factor attract more links to their websites. It also indicates that the three search engines are significantly different in terms of total links, outlinks and web impact factors

  12. Utilizing mixed methods research in analyzing Iranian researchers’ informarion search behaviour in the Web and presenting current pattern

    Directory of Open Access Journals (Sweden)

    Maryam Asadi

    2015-12-01

    Full Text Available Using mixed methods research design, the current study has analyzed Iranian researchers’ information searching behaviour on the Web.Then based on extracted concepts, the model of their information searching behavior was revealed. . Forty-four participants, including academic staff from universities and research centers were recruited for this study selected by purposive sampling. Data were gathered from questionnairs including ten questions and semi-structured interview. Each participant’s memos were analyzed using grounded theory methods adapted from Strauss & Corbin (1998. Results showed that the main objectives of subjects were doing a research, writing a paper, studying, doing assignments, downloading files and acquiring public information in using Web. The most important of learning about how to search and retrieve information were trial and error and get help from friends among the subjects. Information resources are identified by searching in information resources (e.g. search engines, references in papers, and search in Online database… communications facilities & tools (e.g. contact with colleagues, seminars & workshops, social networking..., and information services (e.g. RSS, Alerting, and SDI. Also, Findings indicated that searching by search engines, reviewing references, searching in online databases, and contact with colleagues and studying last issue of the electronic journals were the most important for searching. The most important strategies were using search engines and scientific tools such as Google Scholar. In addition, utilizing from simple (Quick search method was the most common among subjects. Using of topic, keywords, title of paper were most important of elements for retrieval information. Analysis of interview showed that there were nine stages in researchers’ information searching behaviour: topic selection, initiating search, formulating search query, information retrieval, access to information

  13. Effects of Image and Layered Structure on Web Search Performance -Evaluation on the Basis of Movement Distance of Mouse Pointer-

    OpenAIRE

    Murata, Atsuo; Hayami, Takehito; Moriwaka, Makoto; Takahashi, Rina

    2009-01-01

    The aim of this paper was to explore the effects of image addition and layered structure on Web search performance on the basis of the search time and the movement trajectory of mouse pointer. The difference of search characteristics between young and older adults was also examined. Older adults tended to take more time to search for the linked item especially when the layered structure was deep. For the deep layered structure, both young and older adults allocate more time to think which ite...

  14. Sagace: A web-based search engine for biomedical databases in Japan

    Directory of Open Access Journals (Sweden)

    Morita Mizuki

    2012-10-01

    Full Text Available Abstract Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data and biological resource banks (such as mouse models of disease and cell lines. With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/.

  15. Searching for information on the World Wide Web with a search engine: a pilot study on cognitive flexibility in younger and older users.

    Science.gov (United States)

    Dommes, Aurelie; Chevalier, Aline; Rossetti, Marilyne

    2010-04-01

    This pilot study investigated the age-related differences in searching for information on the World Wide Web with a search engine. 11 older adults (6 men, 5 women; M age=59 yr., SD=2.76, range=55-65 yr.) and 12 younger adults (2 men, 10 women; M=23.7 yr., SD=1.07, range=22-25 yr.) had to conduct six searches differing in complexity, and for which a search method was or was not induced. The results showed that the younger and older participants provided with an induced search method were less flexible than the others and produced fewer new keywords. Moreover, older participants took longer than the younger adults, especially in the complex searches. The younger participants were flexible in the first request and spontaneously produced new keywords (spontaneous flexibility), whereas the older participants only produced new keywords when confronted by impasses (reactive flexibility). Aging may influence web searches, especially the nature of keywords used.

  16. Database with web interface and search engine as a diagnostics tool for electromagnetic calorimeter

    CERN Document Server

    Paluoja, Priit

    2017-01-01

    During 2016 data collection, the Compact Muon Solenoid Data Acquisition (CMS DAQ) system has shown a very good reliability. Nevertheless, the high complexity of the hardware and the software involved is, by its nature, prone to some occasional problems. As CMS subdetector, electromagnetic calorimeter (ECAL) is affected in the same way. Some of the issues are not predictable and can appear during the year more than once such as components getting noisy, power shortcuts or failing communication between machines. The chain detection-diagnosis-intervention must be as fast as possible to minimise the downtime of the detector. The aim of this project was to create a diagnostic software for ECAL crew, which consists of database and its web interface that allows to search, add and edit the contents of the database.

  17. A Web Browser Interface to Manage the Searching and Organizing of Information on the Web by Learners

    Science.gov (United States)

    Li, Liang-Yi; Chen, Gwo-Dong

    2010-01-01

    Information Gathering is a knowledge construction process. Web learners make a plan for their Information Gathering task based on their prior knowledge. The plan is evolved with new information encountered and their mental model is constructed through continuously assimilating and accommodating new information gathered from different Web pages. In…

  18. How happy is your web browsing? A model to quantify satisfaction of an Internet user searching for desired information

    Science.gov (United States)

    Banerji, Anirban; Magarkar, Aniket

    2012-09-01

    We feel happy when web browsing operations provide us with necessary information; otherwise, we feel bitter. How to measure this happiness (or bitterness)? How does the profile of happiness grow and decay during the course of web browsing? We propose a probabilistic framework that models the evolution of user satisfaction, on top of his/her continuous frustration at not finding the required information. It is found that the cumulative satisfaction profile of a web-searching individual can be modeled effectively as the sum of a random number of random terms, where each term is a mutually independent random variable, originating from ‘memoryless’ Poisson flow. Evolution of satisfaction over the entire time interval of a user’s browsing was modeled using auto-correlation analysis. A utilitarian marker, a magnitude of greater than unity of which describes happy web-searching operations, and an empirical limit that connects user’s satisfaction with his frustration level-are proposed too. The presence of pertinent information in the very first page of a website and magnitude of the decay parameter of user satisfaction (frustration, irritation etc.) are found to be two key aspects that dominate the web user’s psychology. The proposed model employed different combinations of decay parameter, searching time and number of helpful websites. The obtained results are found to match the results from three real-life case studies.

  19. A web-based search engine for triplex-forming oligonucleotide target sequences.

    Science.gov (United States)

    Gaddis, Sara S; Wu, Qi; Thames, Howard D; DiGiovanni, John; Walborg, Earl F; MacLeod, Michael C; Vasquez, Karen M

    2006-01-01

    Triplex technology offers a useful approach for site-specific modification of gene structure and function both in vitro and in vivo. Triplex-forming oligonucleotides (TFOs) bind to their target sites in duplex DNA, thereby forming triple-helical DNA structures via Hoogsteen hydrogen bonding. TFO binding has been demonstrated to site-specifically inhibit gene expression, enhance homologous recombination, induce mutation, inhibit protein binding, and direct DNA damage, thus providing a tool for gene-specific manipulation of DNA. We have developed a flexible web-based search engine to find and annotate TFO target sequences within the human and mouse genomes. Descriptive information about each site, including sequence context and gene region (intron, exon, or promoter), is provided. The engine assists the user in finding highly specific TFO target sequences by eliminating or flagging known repeat sequences and flagging overlapping genes. A convenient way to check for the uniqueness of a potential TFO binding site is provided via NCBI BLAST. The search engine may be accessed at spi.mdanderson.org/tfo.

  20. Collab-Analyzer: An Environment for Conducting Web-Based Collaborative Learning Activities and Analyzing Students' Information-Searching Behaviors

    Science.gov (United States)

    Wu, Chih-Hsiang; Hwang, Gwo-Jen; Kuo, Fan-Ray

    2014-01-01

    Researchers have found that students might get lost or feel frustrated while searching for information on the Internet to deal with complex problems without real-time guidance or supports. To address this issue, a web-based collaborative learning system, Collab-Analyzer, is proposed in this paper. It is not only equipped with a collaborative…

  1. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance

    Science.gov (United States)

    Chan, Emily H.; Sahai, Vikram; Conrad, Corrie; Brownstein, John S.

    2011-01-01

    Background A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. Methodology/Principal Findings Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. Conclusions/Significance Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance. PMID:21647308

  2. Using web search query data to monitor dengue epidemics: a new model for neglected tropical disease surveillance.

    Directory of Open Access Journals (Sweden)

    Emily H Chan

    2011-05-01

    Full Text Available BACKGROUND: A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics. METHODOLOGY/PRINCIPAL FINDINGS: Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003-2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99. CONCLUSIONS/SIGNIFICANCE: Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.

  3. Searching the Internet for drug-related web sites: analysis of online available information on ecstasy (MDMA).

    Science.gov (United States)

    Deluca, Paolo; Schifano, Fabrizio

    2007-01-01

    Although the Internet is a growing source of information on MDMA/ecstasy, no studies so far have investigated the level and quality of ecstasy information available to the typical Web user. In the present study, 280 Web sites were identified and analyzed; 50.4% had an anti-drug approach, 16.2% a harm reduction approach, and 24.8% a pro-drug approach. MDMA pro-drug Web sites appeared significantly earlier in the search engines' results list than both anti-drug and harm reduction Web sites (F (3; 159) = 3.288; p = .022). This study represents the first systematic analysis of information available online on ecstasy. Implications for further research are discussed.

  4. Knowledge-based personalized search engine for the Web-based Human Musculoskeletal System Resources (HMSR) in biomechanics.

    Science.gov (United States)

    Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba

    2013-02-01

    Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes.

  5. Linking Annual Prescription Volume of Antidepressants to Corresponding Web Search Query Data: A Possible Proxy for Medical Prescription Behavior?

    Science.gov (United States)

    Gahr, Maximilian; Uzelac, Zeljko; Zeiss, René; Connemann, Bernhard J; Lang, Dirk; Schönfeldt-Lecuona, Carlos

    2015-12-01

    Persons using the Internet to retrieve medical information generate large amounts of health-related data, which are increasingly used in modern health sciences. We analyzed the relation between annual prescription volumes (APVs) of several antidepressants with marketing approval in Germany and corresponding web search query data generated in Google to test whether web search query volume may be a proxy for medical prescription practice. We obtained APVs of several antidepressants related to corresponding prescriptions at the expense of the statutory health insurance in Germany from 2004 to 2013. Web search query data generated in Germany and related to defined search terms (active substance or brand name) were obtained with Google Trends. We calculated correlations (Person's r) between the APVs of each substance and the respective annual "search share" values; coefficients of determination (R) were computed to determine the amount of variability shared by the 2 variables. Significant and strong correlations between substance-specific APVs and corresponding annual query volumes were found for each substance during the observational interval: agomelatine (r = 0.968, R = 0.932, P = 0.01), bupropion (r = 0.962, R = 0.925, P = 0.01), citalopram (r = 0.970, R = 0.941, P = 0.01), escitalopram (r = 0.824, R = 0.682, P = 0.01), fluoxetine (r = 0.885, R = 0.783, P = 0.01), paroxetine (r = 0.801, R = 0.641, P = 0.01), and sertraline (r = 0.880, R = 0.689, P = 0.01). Although the used data did not allow to perform an analysis with a higher temporal resolution (quarters, months), our results suggest that web search query volume may be a proxy for corresponding prescription behavior. However, further studies analyzing other pharmacologic agents and prescription data that facilitate an increased temporal resolution are needed to confirm this hypothesis.

  6. EVALUATION OF WEB SEARCHING METHOD USING A NOVEL WPRR ALGORITHM FOR TWO DIFFERENT CASE STUDIES

    Directory of Open Access Journals (Sweden)

    V. Lakshmi Praba

    2012-04-01

    Full Text Available The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to web data and documents. Web content mining and web structure mining have important roles in identifying the relevant web page. Relevancy of web page denotes how well a retrieved web page or set of web pages meets the information need of the user. Page Rank, Weighted Page Rank and Hypertext Induced Topic Selection (HITS are existing algorithms which considers only web structure mining. Vector Space Model (VSM, Cover Density Ranking (CDR, Okapi similarity measurement (Okapi and Three-Level Scoring method (TLS are some of existing relevancy score methods which consider only web content mining. In this paper, we propose a new algorithm, Weighted Page with Relevant Rank (WPRR which is blend of both web content mining and web structure mining that demonstrates the relevancy of the page with respect to given query for two different case scenarios. It is shown that WPRR’s performance is better than the existing algorithms.

  7. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches.

    Science.gov (United States)

    Svenstrup, Dan; Jørgensen, Henrik L; Winther, Ole

    2015-01-01

    Physicians and the general public are increasingly using web-based tools to find answers to medical questions. The field of rare diseases is especially challenging and important as shown by the long delay and many mistakes associated with diagnoses. In this paper we review recent initiatives on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description of IBM's Watson system and make a rough comparison between findzebra.com and Watson on subsets of the Doctor's dilemma dataset. The recall@10 and recall@20 (fraction of cases where the correct result appears in top 10 and top 20) for the 56 cases are found to be be 29%, 16%, 27% and 59% and 32%, 18%, 34% and 64%, respectively. Thus, FindZebra has a significantly (p data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise.

  8. Abyss or Shelter? On the Relevance of Web Search Engines' Search Results When People Google for Suicide.

    Science.gov (United States)

    Haim, Mario; Arendt, Florian; Scherr, Sebastian

    2017-02-01

    Despite evidence that suicide rates can increase after suicides are widely reported in the media, appropriate depictions of suicide in the media can help people to overcome suicidal crises and can thus elicit preventive effects. We argue on the level of individual media users that a similar ambivalence can be postulated for search results on online suicide-related search queries. Importantly, the filter bubble hypothesis (Pariser, 2011) states that search results are biased by algorithms based on a person's previous search behavior. In this study, we investigated whether suicide-related search queries, including either potentially suicide-preventive or -facilitative terms, influence subsequent search results. This might thus protect or harm suicidal Internet users. We utilized a 3 (search history: suicide-related harmful, suicide-related helpful, and suicide-unrelated) × 2 (reactive: clicking the top-most result link and no clicking) experimental design applying agent-based testing. While findings show no influences either of search histories or of reactivity on search results in a subsequent situation, the presentation of a helpline offer raises concerns about possible detrimental algorithmic decision-making: Algorithms "decided" whether or not to present a helpline, and this automated decision, then, followed the agent throughout the rest of the observation period. Implications for policy-making and search providers are discussed.

  9. CALIL.JP, a new web service that provides one-stop searching of Japan-wide libraries' collections

    Science.gov (United States)

    Yoshimoto, Ryuuji

    Calil.JP is a new free online service that enables federated searching, marshalling and integration of Web-OPAC data on the collections of libraries from around Japan. It offers the search results through user-friendly interface. Developed with a concept of accelerating discovery of fun-to-read books and motivating users to head for libraries, Calil was initially designed mainly for public library users. It now extends to cover university libraries and special libraries. This article presents the Calil's basic capabilities, concept, progress made thus far, and plan for further development as viewed from an engineering development manager.

  10. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies

    CERN Document Server

    Arencibia-Jorge, Ricardo; Chinchilla-Rodriguez, Zaida; Rousseau, Ronald; Paris, Soren W

    2009-01-01

    The current communication presents a simple exercise with the aim of solving a singular problem: the retrieval of extremely large amounts of items in the Web of Science interface. As it is known, Web of Science interface allows a user to obtain at most 100,000 items from a single query. But what about queries that achieve a result of more than 100,000 items? The exercise developed one possible way to achieve this objective. The case study is the retrieval of the entire scientific production from the United States in a specific year. Different sections of items were retrieved using the field Source of the database. Then, a simple Boolean statement was created with the aim of eliminating overlapping and to improve the accuracy of the search strategy. The importance of team work in the development of advanced search strategies was noted.

  11. Personal Search Engine Based on Web Services%基于Web Services的个人搜索引擎

    Institute of Scientific and Technical Information of China (English)

    曹龙; 杜亚军; 刘倬; 李战胜

    2005-01-01

    随着Internet的出现,要从浩如烟海的网络信息中找出自己所想要的信息是非常困难的,Google是非常著名的搜索引擎,其搜索机制是面向大众的.其提供的Web services功能利用标准化的Web协议在任何一种网络环境下可以使用这种服务,为定制个性化的搜索引擎提供了方便.文章探讨了Google搜索引擎提供的Web Service及程序实现方式,并使用Delphi提供的Web Services组件与Google服务接口,实现了一个个性化的搜索引擎,实验表明其搜索结果与Google本身的搜索功能有了大的改进.

  12. Eysenbach, Tuische and Diepgen’s Evaluation of Web Searching for Identifying Unpublished Studies for Systematic Reviews: An Innovative Study Which is Still Relevant Today.

    Directory of Open Access Journals (Sweden)

    Simon Briscoe

    2016-09-01

    Full Text Available A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001. Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3, 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The

  13. The Invisible Web: Uncovering Information Sources Search Engines Can't See.

    Science.gov (United States)

    Sherman, Chris; Price, Gary

    This book takes a detailed look at the nature and extent of the Invisible Web, and offers pathfinders for accessing the valuable information it contains. It is designed to fit the needs of both novice and advanced Web searchers. Chapter One traces the development of the Internet and many of the early tools used to locate and share information via…

  14. Using anchor text, spam filtering and Wikipedia for web search and entity ranking

    NARCIS (Netherlands)

    Kamps, J.; Kaptein, R.; Koolen, M.; Voorhees, E.M.; Buckland, L.P.

    2010-01-01

    In this paper, we document our efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track we wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as PageRan

  15. Collaborating and delivering literature search results to clinical teams using web 2.0 tools.

    Science.gov (United States)

    Damani, Shamsha; Fulton, Stephanie

    2010-07-01

    This article describes the experiences of librarians at the Research Medical Library embedded within clinical teams at The University of Texas MD Anderson Cancer Center and their efforts to enhance communication within their teams using Web 2.0 tools. Pros and cons of EndNote Web, Delicious, Connotea, PBWorks, and SharePoint are discussed.

  16. The Invisible Web: Uncovering Information Sources Search Engines Can't See.

    Science.gov (United States)

    Sherman, Chris; Price, Gary

    This book takes a detailed look at the nature and extent of the Invisible Web, and offers pathfinders for accessing the valuable information it contains. It is designed to fit the needs of both novice and advanced Web searchers. Chapter One traces the development of the Internet and many of the early tools used to locate and share information via…

  17. A fuzzy method for improving the functionality of search engines based on user's web interactions

    Directory of Open Access Journals (Sweden)

    Farzaneh Kabirbeyk

    2015-04-01

    Full Text Available Web mining has been widely used to discover knowledge from various sources in the web. One of the important tools in web mining is mining of web user’s behavior that is considered as a way to discover the potential knowledge of web user’s interaction. Nowadays, Website personalization is regarded as a popular phenomenon among web users and it plays an important role in facilitating user access and provides information of users’ requirements based on their own interests. Extracting important features about web user behavior plays a significant role in web usage mining. Such features are page visit frequency in each session, visit duration, and dates of visiting a certain pages. This paper presents a method to predict user’s interest and to propose a list of pages based on their interests by identifying user’s behavior based on fuzzy techniques called fuzzy clustering method. Due to the user’s different interests and use of one or more interest at a time, user’s interest may belong to several clusters and fuzzy clustering provide a possible overlap. Using the resulted cluster helps extract fuzzy rules. This helps detecting user’s movement pattern and using neural network a list of suggested pages to the users is provided.

  18. SCANPS: a web server for iterative protein sequence database searching by dynamic programing, with display in a hierarchical SCOP browser.

    Science.gov (United States)

    Walsh, Thomas P; Webber, Caleb; Searle, Stephen; Sturrock, Shane S; Barton, Geoffrey J

    2008-07-01

    SCANPS performs iterative profile searching similar to PSI-BLAST but with full dynamic programing on each cycle and on-the-fly estimation of significance. This combination gives good sensitivity and selectivity that outperforms PSI-BLAST in domain-searching benchmarks. Although computationally expensive, SCANPS exploits onchip parallelism (MMX and SSE2 instructions on Intel chips) as well as MPI parallelism to give acceptable turnround times even for large databases. A web server developed to run SCANPS searches is now available at http://www.compbio.dundee.ac.uk/www-scanps. The server interface allows a range of different protein sequence databases to be searched including the SCOP database of protein domains. The server provides the user with regularly updated versions of the main protein sequence databases and is backed up by significant computing resources which ensure that searches are performed rapidly. For SCOP searches, the results may be viewed in a new tree-based representation that reflects the structure of the SCOP hierarchy; this aids the user in placing each hit in the context of its SCOP classification and understanding its relationship to other domains in SCOP.

  19. DESIGN AND IMPLEMENTATION OF WEB SERVICES SEARCH ENGINE%Web服务搜索引擎的设计与实现

    Institute of Scientific and Technical Information of China (English)

    贺财平; 覃事刚; 刘建勋

    2011-01-01

    With the gradually increasing number of open Web services, it is crucial to obtain in full and effectively such open Web services scattered on the internet as well as manage them.In this paper,we designed and implemented a WSSE (Web Services Search Engine) in order to solve this issue.A robot was developed in WSSE to crawl along the existing Web sites incessantly for searching the Web services, and the found Web services would then be stored in a centralized management system.Finally we used the open-source Lucene to index the found Web services for enhancing the efficiency of Web services retrieval.%随着开放的Web服务数量的逐渐递增,全面而有效地获取散落于Internet上的这类Web服务并进行管理就成为一个十分重要的问题.针对该问题,设计与实现了一个Web服务搜索引擎WSSE(Web Services Search Engine).它不断爬行现有Web站点以搜索Web服务,并对搜集到的Web服务进行集中式管理,最后采用开源的Lucene对搜索到的Web服务建立索引,提高了Web服务的检索效率.

  20. Effects of Diacritics on Web Search Engines’ Performance for Retrieval of Yoruba Documents

    Directory of Open Access Journals (Sweden)

    Toluwase Victor Asubiaro

    2014-06-01

    Full Text Available This paper aims to find out the possible effect of the use or nonuse of diacritics in Yoruba search queries on the performance of major search engines, AOL, Bing, Google and Yahoo!, in retrieving documents. 30 Yoruba queries created from the most searched keywords from Nigeria on Google search logs were submitted to the search engines. The search queries were posed to the search engines without diacritics and then with diacritics. All of the search engines retrieved more sites in response to the queries without diacritics. Also, they all retrieved more precise results for queries without diacritics. The search engines also answered more queries without diacritics. There was no significant difference in the precision values of any two of the four search engines for diacritized and undiacritized queries. There was a significant difference in the effectiveness of AOL and Yahoo when diacritics were applied and when they were not applied. The findings of the study indicate that the search engines do not find a relationship between the diacritized Yoruba words and the undiacritized versions. Therefore, there is a need for search engines to add normalization steps to pre-process Yoruba queries and indexes. This study concentrates on a problem with search engines that has not been previously investigated.

  1. Research on Search Engine Based on Semantic Web%基于语义Web的搜索引擎研究

    Institute of Scientific and Technical Information of China (English)

    吴根斌; 丁振凡

    2012-01-01

    传统搜索引擎是基于关键字的检索,然而文档的关键字未必和文档有关,而相关的文档也未必显式地包含此关键字.基于语义Web的搜索引擎利用本体技术,可以很好地对关键字进行语义描述.当收到用户提交的搜索请求时,先在已经建立好的本体库的基础上对该请求进行概念推理,然后将推理结果提交给传统的搜索引擎,最终将搜索结果返回给用户.相对于传统的搜索引擎,基于语义Web的搜索引擎有效地提高了搜索的查全率和查准率.%Traditional search engines are keyword-based retrieval. However, the keywords of a document may not be related with the document, and the related documents may also not explicitly contain this keyword. Using ontology technology, search engine based on semantic Web could semantically describe the keywords. When receiving a user' s request of query, system executes logical reasoning on the ontology already created and sends the result of logical reasoning to the traditional engine. The system re-turns the result of search to user at last. Comparing with the traditional search engine, the search engine based on semantic Web can effectively improve the coverage rate and precision rate.

  2. The efficacy of using search engines in procuring information about orthopaedic foot and ankle problems from the World Wide Web.

    Science.gov (United States)

    Nogler, M; Wimmer, C; Mayr, E; Ofner, D

    1999-05-01

    This study has attempted to demonstrate the feasibility of obtaining information specific to foot and ankle orthopaedics from the World Wide Web (WWW). Six search engines (Lycos, AltaVista, Infoseek, Excite, Webcrawler, and HotBot) were used in scanning the Web for the following key words: "cavus foot," "diabetic foot," "hallux valgus,"and "pes equinovarus." Matches were classified by language, provider, type, and relevance to medical professionals or to patients. Sixty percent (407 sites) of the visited websites contained information intended for use by physicians and other medical professionals; 30% (206 sites) were related to patient information; 10% of the sites were not easily classifiable. Forty-one percent (169 sites) of the websites were commercially oriented homepages that included advertisements.

  3. Literaure search for intermittent rivers research using ISI Web of Science

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset is the bibliometric information included in the ISI Web of Science database of scientific literature. Table S2 accessible from the dataset link provides...

  4. Developing a Data Discovery Tool for Interdisciplinary Science: Leveraging a Web-based Mapping Application and Geosemantic Searching

    Science.gov (United States)

    Albeke, S. E.; Perkins, D. G.; Ewers, S. L.; Ewers, B. E.; Holbrook, W. S.; Miller, S. N.

    2015-12-01

    The sharing of data and results is paramount for advancing scientific research. The Wyoming Center for Environmental Hydrology and Geophysics (WyCEHG) is a multidisciplinary group that is driving scientific breakthroughs to help manage water resources in the Western United States. WyCEHG is mandated by the National Science Foundation (NSF) to share their data. However, the infrastructure from which to share such diverse, complex and massive amounts of data did not exist within the University of Wyoming. We developed an innovative framework to meet the data organization, sharing, and discovery requirements of WyCEHG by integrating both open and closed source software, embedded metadata tags, semantic web technologies, and a web-mapping application. The infrastructure uses a Relational Database Management System as the foundation, providing a versatile platform to store, organize, and query myriad datasets, taking advantage of both structured and unstructured formats. Detailed metadata are fundamental to the utility of datasets. We tag data with Uniform Resource Identifiers (URI's) to specify concepts with formal descriptions (i.e. semantic ontologies), thus allowing users the ability to search metadata based on the intended context rather than conventional keyword searches. Additionally, WyCEHG data are geographically referenced. Using the ArcGIS API for Javascript, we developed a web mapping application leveraging database-linked spatial data services, providing a means to visualize and spatially query available data in an intuitive map environment. Using server-side scripting (PHP), the mapping application, in conjunction with semantic search modules, dynamically communicates with the database and file system, providing access to available datasets. Our approach provides a flexible, comprehensive infrastructure from which to store and serve WyCEHG's highly diverse research-based data. This framework has not only allowed WyCEHG to meet its data stewardship

  5. A Strategic Analysis of Search Engine Advertising in Web based-commerce

    Directory of Open Access Journals (Sweden)

    Ela Kumar

    2007-08-01

    Full Text Available Endeavor of this paper is to explore the role play of Search Engine in Online Business Industry. This paper discusses the Search Engine advertising programs and provides an insight about the revenue generated online via Search Engine. It explores the growth of Online Business Industry in India and emphasis on the role of Search Engine as the major advertising vehicle. A case study on re volution of Indian Advertising Industry has been conducted and its impact on online revenu e evaluated. Search Engine advertising strategies have been discussed in detail and the impact of Search Engine on Indian Advertising Industry has been analyzed. It also provides an analytical and competitive study of online advertising strategies with traditional advertising tools to evaluate their efficiencies against important advertising parameters. The paper concludes with a brief discussion on the malpractices that have adversarial impact on the efficiency of the Search Engine advertising model and highlight key hurdle Search Engine Industry is facing in Indian Business Scenario

  6. Identifying the Impact of Domain Knowledge and Cognitive Style on Web-Based Information Search Behavior

    Science.gov (United States)

    Park, Young; Black, John B.

    2007-01-01

    Although information searching in hypermedia environments has become a new important problem solving capability, there is not much known about what types of individual characteristics constitute a successful information search behavior. This study mainly investigated which of the 2 factors, 1) natural characteristics (cognitive style), and 2)…

  7. Web Usage Mining Analysis of Federated Search Tools for Egyptian Scholars

    Science.gov (United States)

    Mohamed, Khaled A.; Hassan, Ahmed

    2008-01-01

    Purpose: This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to provide guidance for federated search tool technicians and support teams about user issues, including the need for training. Design/methodology/approach: Log…

  8. Web Usage Mining Analysis of Federated Search Tools for Egyptian Scholars

    Science.gov (United States)

    Mohamed, Khaled A.; Hassan, Ahmed

    2008-01-01

    Purpose: This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to provide guidance for federated search tool technicians and support teams about user issues, including the need for training. Design/methodology/approach: Log…

  9. Chemical compound navigator: a web-based chem-BLAST, chemical taxonomy-based search engine for browsing compounds.

    Science.gov (United States)

    Prasanna, M D; Vondrasek, Jiri; Wlodawer, Alexander; Rodriguez, H; Bhat, T N

    2006-06-01

    A novel technique to annotate, query, and analyze chemical compounds has been developed and is illustrated by using the inhibitor data on HIV protease-inhibitor complexes. In this method, all chemical compounds are annotated in terms of standard chemical structural fragments. These standard fragments are defined by using criteria, such as chemical classification; structural, chemical, or functional groups; and commercial, scientific or common names or synonyms. These fragments are then organized into a data tree based on their chemical substructures. Search engines have been developed to use this data tree to enable query on inhibitors of HIV protease (http://xpdb.nist.gov/hivsdb/hivsdb.html). These search engines use a new novel technique, Chemical Block Layered Alignment of Substructure Technique (Chem-BLAST) to search on the fragments of an inhibitor to look for its chemical structural neighbors. This novel technique to annotate and query compounds lays the foundation for the use of the Semantic Web concept on chemical compounds to allow end users to group, sort, and search structural neighbors accurately and efficiently. During annotation, it enables the attachment of "meaning" (i.e., semantics) to data in a manner that far exceeds the current practice of associating "metadata" with data by creating a knowledge base (or ontology) associated with compounds. Intended users of the technique are the research community and pharmaceutical industry, for which it will provide a new tool to better identify novel chemical structural neighbors to aid drug discovery.

  10. MotifCombinator: a web-based tool to search for combinations of cis-regulatory motifs

    Directory of Open Access Journals (Sweden)

    Tsunoda Tatsuhiko

    2007-03-01

    Full Text Available Abstract Background A combination of multiple types of transcription factors and cis-regulatory elements is often required for gene expression in eukaryotes, and the combinatorial regulation confers specific gene expression to tissues or environments. To reveal the combinatorial regulation, computational methods are developed that efficiently infer combinations of cis-regulatory motifs that are important for gene expression as measured by DNA microarrays. One promising type of computational method is to utilize regression analysis between expression levels and scores of motifs in input sequences. This type takes full advantage of information on expression levels because it does not require that the expression level of each gene be dichotomized according to whether or not it reaches a certain threshold level. However, there is no web-based tool that employs regression methods to systematically search for motif combinations and that practically handles combinations of more than two or three motifs. Results We here introduced MotifCombinator, an online tool with a user-friendly interface, to systematically search for combinations composed of any number of motifs based on regression methods. The tool utilizes well-known regression methods (the multivariate linear regression, the multivariate adaptive regression spline or MARS, and the multivariate logistic regression method for this purpose, and uses the genetic algorithm to search for combinations composed of any desired number of motifs. The visualization systems in this tool help users to intuitively grasp the process of the combination search, and the backup system allows users to easily stop and restart calculations that are expected to require large computational time. This tool also provides preparatory steps needed for systematic combination search – i.e., selecting single motifs to constitute combinations and cutting out redundant similar motifs based on clustering analysis. Conclusion

  11. Comparative Study of Web-based Virtual Library and Web Search Engine on Internet%Web资源虚拟图书馆与搜索引擎的比较研究

    Institute of Scientific and Technical Information of China (English)

    贺亚锋

    2000-01-01

    Particularly based on the comparative study of Web-based virtual library and Web search engine on the Internet, this paper discusses the similarities and differences in retrieval theory, retrieval performance and effect,in order to put forward suggestions for the development of Web-based virtual library and the promotion of Web search engine.%本文对Web上的主要信息检索工具-图书馆制作的Web资源虚拟图书馆和ICP研制的搜索引擎作分析比较,目的在于探讨两种检索工具之间的检索理论、检索性能和检索效果的异同,以期对Web资源虚拟图书馆的发展和搜索引擎的改进提供借鉴。

  12. Construction of web-based nutrition education contents and searching engine for usage of healthy menu of children.

    Science.gov (United States)

    Hong, Soon-Myung; Lee, Tae-Kyong; Chung, Hea-Jung; Park, Hye-Kyung; Lee, Eun-Ju; Nam, Hye-Seon; Jung, Soon-Im; Cho, Jee-Ye; Lee, Jin-Hee; Kim, Gon; Kim, Min-Chan

    2008-01-01

    A diet habit, which is developed in childhood, lasts for a life time. In this sense, nutrition education and early exposure to healthy menus in childhood is important. Children these days have easy access to the internet. Thus, a web-based nutrition education program for children is an effective tool for nutrition education of children. This site provides the material of the nutrition education for children with characters which are personified nutrients. The 151 menus are stored in the site together with video script of the cooking process. The menus are classified by the criteria based on age, menu type and the ethnic origin of the menu. The site provides a search function. There are three kinds of search conditions which are key words, menu type and "between" expression of nutrients such as calorie and other nutrients. The site is developed with the operating system Windows 2003 Server, the web server ZEUS 5, development language JSP, and database management system Oracle 10 g.

  13. Modeling of protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking.

    Science.gov (United States)

    Blaszczyk, Maciej; Kurcinski, Mateusz; Kouza, Maksim; Wieteska, Lukasz; Debinski, Aleksander; Kolinski, Andrzej; Kmiecik, Sebastian

    2016-01-15

    Protein-peptide interactions play essential functional roles in living organisms and their structural characterization is a hot subject of current experimental and theoretical research. Computational modeling of the structure of protein-peptide interactions is usually divided into two stages: prediction of the binding site at a protein receptor surface, and then docking (and modeling) the peptide structure into the known binding site. This paper presents a comprehensive CABS-dock method for the simultaneous search of binding sites and flexible protein-peptide docking, available as a user's friendly web server. We present example CABS-dock results obtained in the default CABS-dock mode and using its advanced options that enable the user to increase the range of flexibility for chosen receptor fragments or to exclude user-selected binding modes from docking search. Furthermore, we demonstrate a strategy to improve CABS-dock performance by assessing the quality of models with classical molecular dynamics. Finally, we discuss the promising extensions and applications of the CABS-dock method and provide a tutorial appendix for the convenient analysis and visualization of CABS-dock results. The CABS-dock web server is freely available at http://biocomp.chem.uw.edu.pl/CABSdock/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Design and Implementation of Web2.0 Community Search Module Ranking Algorithm%Web2.0社区搜索模块排序算法的设计与实现

    Institute of Scientific and Technical Information of China (English)

    王非; 吴庆波; 杨沙洲

    2009-01-01

    Web page ranking technology is one of the core technologies of search engine. This paper describes the necessity of Web2.0 community to build semantic search, analyzes the factors impacting the page ranking, and implements the ranking algorithm of search engine to the search module technology which is based on Web2.0 community. Based on the improved algorithm of TF/IDF and PageRank, a search module is worked out which is based on semantic ranking on a developing platform of Web2.0 open source community. Experimental results show that the ranking algorithm is featured as exactly finding the wanted contents, and ranking the efficient results on top.%网页排序技术是搜索引擎的核心技术之一.描述Web2.0社区构建语义搜索的必要性,分析影响网页排序的因素,将搜索引擎的排序算法借鉴到基于Web2.0社区的搜索模块中,以改进的TF/IDF和PageRank算法为基础,在一个Web2.0开源社区开发平台上实现基于语义排序的搜索模块.测试结果表明,该排序算法具有内容定位精确、有效结果靠前的特点.

  15. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

    Directory of Open Access Journals (Sweden)

    Zhao Xuechun

    2007-02-01

    Full Text Available Abstract Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1 query and target sequence database management, (2 automated high-throughput BLAST searching, (3 indexing and searching of results, (4 filtering results online, (5 managing results of personal interest in favorite categories, (6 automated sequence annotation (such as NCBI NR and ontology-based annotation. PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results

  16. Web-scale near-duplicate search: Techniques and applications : Guest Editor’s Introduction

    NARCIS (Netherlands)

    Ngo, C.W.; Xu, C.; Kraaij, W.; El Saddik, A.

    2013-01-01

    As the bandwidth accessible to average users has increased, audiovisual material has become the fastest growing datatype on the Internet. The impressive growth of the social Web, where users can exchange user-generated content, contributes to the overwhelming number of multimedia files available. Am

  17. How students evaluate information and sources when searching the World Wide Web for information

    NARCIS (Netherlands)

    Walraven, Amber; Brand-Gruwel, Saskia; Boshuizen, Henny P.A.

    2009-01-01

    The World Wide Web (WWW) has become the biggest information source for students while solving information problems for school projects. Since anyone can post anything on the WWW, information is often unreliable or incomplete, and it is important to evaluate sources and information before using them.

  18. How Students Evaluate Information and Sources when Searching the World Wide Web for Information

    Science.gov (United States)

    Walraven, Amber; Brand-Gruwel, Saskia; Boshuizen, Henny P. A.

    2009-01-01

    The World Wide Web (WWW) has become the biggest information source for students while solving information problems for school projects. Since anyone can post anything on the WWW, information is often unreliable or incomplete, and it is important to evaluate sources and information before using them. Earlier research has shown that students have…

  19. How Students Evaluate Information and Sources when Searching the World Wide Web for Information

    Science.gov (United States)

    Walraven, Amber; Brand-Gruwel, Saskia; Boshuizen, Henny P. A.

    2009-01-01

    The World Wide Web (WWW) has become the biggest information source for students while solving information problems for school projects. Since anyone can post anything on the WWW, information is often unreliable or incomplete, and it is important to evaluate sources and information before using them. Earlier research has shown that students have…

  20. How students evaluate information and sources when searching the World Wide Web for information

    NARCIS (Netherlands)

    Walraven, Amber; Brand-Gruwel, Saskia; Boshuizen, Henny P.A.

    2009-01-01

    The World Wide Web (WWW) has become the biggest information source for students while solving information problems for school projects. Since anyone can post anything on the WWW, information is often unreliable or incomplete, and it is important to evaluate sources and information before using them.

  1. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search

    DEFF Research Database (Denmark)

    Havgaard, Jakob Hull; Lyngsø, Rune B.; Gorodkin, Jan

    2005-01-01

    FOLDALIGN is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of FOLDALIGN. The server can be used to scan two sequences for a common structural R...

  2. Social Networking Web Sites and Human Resource Personnel: Suggestions for Job Searches

    Science.gov (United States)

    Roberts, Sherry J.; Roach, Terry

    2009-01-01

    Social Networking Web sites (SNWs) are now being used as reference checks by human resource personnel. For this reason, SNW users, particularly university students and other soon-to-be job applicants, should ask the following questions: Am I loading information that I want the world to see? Is this really a picture that shows me in the best light?…

  3. Social Networking Web Sites and Human Resource Personnel: Suggestions for Job Searches

    Science.gov (United States)

    Roberts, Sherry J.; Roach, Terry

    2009-01-01

    Social Networking Web sites (SNWs) are now being used as reference checks by human resource personnel. For this reason, SNW users, particularly university students and other soon-to-be job applicants, should ask the following questions: Am I loading information that I want the world to see? Is this really a picture that shows me in the best light?…

  4. SEARCHING FOR COMETS ON THE WORLD WIDE WEB: THE ORBIT OF 17P/HOLMES FROM THE BEHAVIOR OF PHOTOGRAPHERS

    Energy Technology Data Exchange (ETDEWEB)

    Lang, Dustin [Princeton University Observatory, Princeton, NJ 08544 (United States); Hogg, David W., E-mail: dstn@astro.princeton.edu [Center for Cosmology and Particle Physics, Department of Physics, New York University, 4 Washington Place, New York, NY 10003 (United States)

    2012-08-15

    We performed an image search for 'Comet Holmes', using the Yahoo{exclamation_point} Web search engine, on 2010 April 1. Thousands of images were returned. We astrometrically calibrated-and therefore vetted-the images using the Astrometry.net system. The calibrated image pointings form a set of data points to which we can fit a test-particle orbit in the solar system, marginalizing over image dates and detecting outliers. The approach is Bayesian and the model is, in essence, a model of how comet astrophotographers point their instruments. In this work, we do not measure the position of the comet within each image, but rather use the celestial position of the whole image to infer the orbit. We find very strong probabilistic constraints on the orbit, although slightly off the Jet Propulsion Lab ephemeris, probably due to limitations of our model. Hyperparameters of the model constrain the reliability of date meta-data and where in the image astrophotographers place the comet; we find that {approx}70% of the meta-data are correct and that the comet typically appears in the central third of the image footprint. This project demonstrates that discoveries and measurements can be made using data of extreme heterogeneity and unknown provenance. As the size and diversity of astronomical data sets continues to grow, approaches like ours will become more essential. This project also demonstrates that the Web is an enormous repository of astronomical information, and that if an object has been given a name and photographed thousands of times by observers who post their images on the Web, we can (re-)discover it and infer its dynamical properties.

  5. Searching for Comets on the World Wide Web: The Orbit of 17P/Holmes from the Behavior of Photographers

    Science.gov (United States)

    Lang, Dustin; Hogg, David W.

    2012-08-01

    We performed an image search for "Comet Holmes," using the Yahoo! Web search engine, on 2010 April 1. Thousands of images were returned. We astrometrically calibrated—and therefore vetted—the images using the Astrometry.net system. The calibrated image pointings form a set of data points to which we can fit a test-particle orbit in the solar system, marginalizing over image dates and detecting outliers. The approach is Bayesian and the model is, in essence, a model of how comet astrophotographers point their instruments. In this work, we do not measure the position of the comet within each image, but rather use the celestial position of the whole image to infer the orbit. We find very strong probabilistic constraints on the orbit, although slightly off the Jet Propulsion Lab ephemeris, probably due to limitations of our model. Hyperparameters of the model constrain the reliability of date meta-data and where in the image astrophotographers place the comet; we find that ~70% of the meta-data are correct and that the comet typically appears in the central third of the image footprint. This project demonstrates that discoveries and measurements can be made using data of extreme heterogeneity and unknown provenance. As the size and diversity of astronomical data sets continues to grow, approaches like ours will become more essential. This project also demonstrates that the Web is an enormous repository of astronomical information, and that if an object has been given a name and photographed thousands of times by observers who post their images on the Web, we can (re-)discover it and infer its dynamical properties.

  6. Web Similarity

    NARCIS (Netherlands)

    Cohen, A.R.; Vitányi, P.M.B.

    2015-01-01

    Normalized web distance (NWD) is a similarity or normalized semantic distance based on the World Wide Web or any other large electronic database, for instance Wikipedia, and a search engine that returns reliable aggregate page counts. For sets of search terms the NWD gives a similarity on a scale fr

  7. Impact of Predicting Health Care Utilization Via Web Search Behavior: A Data-Driven Analysis.

    Science.gov (United States)

    Agarwal, Vibhu; Zhang, Liangliang; Zhu, Josh; Fang, Shiyuan; Cheng, Tim; Hong, Chloe; Shah, Nigam H

    2016-09-21

    By recent estimates, the steady rise in health care costs has deprived more than 45 million Americans of health care services and has encouraged health care providers to better understand the key drivers of health care utilization from a population health management perspective. Prior studies suggest the feasibility of mining population-level patterns of health care resource utilization from observational analysis of Internet search logs; however, the utility of the endeavor to the various stakeholders in a health ecosystem remains unclear. The aim was to carry out a closed-loop evaluation of the utility of health care use predictions using the conversion rates of advertisements that were displayed to the predicted future utilizers as a surrogate. The statistical models to predict the probability of user's future visit to a medical facility were built using effective predictors of health care resource utilization, extracted from a deidentified dataset of geotagged mobile Internet search logs representing searches made by users of the Baidu search engine between March 2015 and May 2015. We inferred presence within the geofence of a medical facility from location and duration information from users' search logs and putatively assigned medical facility visit labels to qualifying search logs. We constructed a matrix of general, semantic, and location-based features from search logs of users that had 42 or more search days preceding a medical facility visit as well as from search logs of users that had no medical visits and trained statistical learners for predicting future medical visits. We then carried out a closed-loop evaluation of the utility of health care use predictions using the show conversion rates of advertisements displayed to the predicted future utilizers. In the context of behaviorally targeted advertising, wherein health care providers are interested in minimizing their cost per conversion, the association between show conversion rate and predicted

  8. Making Statistical Data More Easily Accessible on the Web Results of the StatSearch Case Study

    CERN Document Server

    Rajman, M; Boynton, I M; Fridlund, B; Fyhrlund, A; Sundgren, B; Lundquist, P; Thelander, H; Wänerskär, M

    2005-01-01

    In this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navigation techniques exploiting the hierarchical structuring of the available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical information. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distributions over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several appro...

  9. Searching for the Yahoos of Academia: Academic Subject Directories on the Web.

    Science.gov (United States)

    Lilla, Rick; Hipps, Nena; Corman, Brenda

    1999-01-01

    Presents results of research into the best of the academic subject directories, based on three general category searches--one historical, one in literature, and one in the field of science. Describes the 11 winning selections in three divided groups: winners, runners-up, and third-place contenders. (AEF)

  10. Developing a Web Tool for Searching and Viewing Collections of High-Quality Cultural Images

    Science.gov (United States)

    Lazarinis, Fotis

    2010-01-01

    Purpose: Searching for information and viewing visual representations of products in e-organisations is a common activity of the e-visitors to these organisations. For example, in e-museums, users are shown images or other visual information of the existing objects. The aim of this paper is to present a tool which supports the effective searching…

  11. Exploring novice users' training needs in searching information on the World Wide Web.

    NARCIS (Netherlands)

    Lazonder, Adrianus W.

    2000-01-01

    Searching for information on the WWW involves locating a website and locating information on that site. A recent study implied that novice users' training needs exclusively relate to locating websites. The present case study tried to reveal the knowledge and skills that constitute these training

  12. Information-computational system for storage, search and analytical processing of environmental datasets based on the Semantic Web technologies

    Science.gov (United States)

    Titov, A.; Gordov, E.; Okladnikov, I.

    2009-04-01

    In this report the results of the work devoted to the development of working model of the software system for storage, semantically-enabled search and retrieval along with processing and visualization of environmental datasets containing results of meteorological and air pollution observations and mathematical climate modeling are presented. Specially designed metadata standard for machine-readable description of datasets related to meteorology, climate and atmospheric pollution transport domains is introduced as one of the key system components. To provide semantic interoperability the Resource Description Framework (RDF, http://www.w3.org/RDF/) technology means have been chosen for metadata description model realization in the form of RDF Schema. The final version of the RDF Schema is implemented on the base of widely used standards, such as Dublin Core Metadata Element Set (http://dublincore.org/), Directory Interchange Format (DIF, http://gcmd.gsfc.nasa.gov/User/difguide/difman.html), ISO 19139, etc. At present the system is available as a Web server (http://climate.risks.scert.ru/metadatabase/) based on the web-portal ATMOS engine [1] and is implementing dataset management functionality including SeRQL-based semantic search as well as statistical analysis and visualization of selected data archives [2,3]. The core of the system is Apache web server in conjunction with Tomcat Java Servlet Container (http://jakarta.apache.org/tomcat/) and Sesame Server (http://www.openrdf.org/) used as a database for RDF and RDF Schema. At present statistical analysis of meteorological and climatic data with subsequent visualization of results is implemented for such datasets as NCEP/NCAR Reanalysis, Reanalysis NCEP/DOE AMIP II, JMA/CRIEPI JRA-25, ECMWF ERA-40 and local measurements obtained from meteorological stations on the territory of Russia. This functionality is aimed primarily at finding of main characteristics of regional climate dynamics. The proposed system represents

  13. Finding research information on the web: how to make the most of Google and other free search tools.

    Science.gov (United States)

    Blakeman, Karen

    2013-01-01

    The Internet and the World Wide Web has had a major impact on the accessibility of research information. The move towards open access and development of institutional repositories has resulted in increasing amounts of information being made available free of charge. Many of these resources are not included in conventional subscription databases and Google is not always the best way to ensure that one is picking up all relevant material on a topic. This article will look at how Google's search engine works, how to use Google more effectively for identifying research information, alternatives to Google and will review some of the specialist tools that have evolved to cope with the diverse forms of information that now exist in electronic form.

  14. GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs

    Directory of Open Access Journals (Sweden)

    Broxholme John

    2009-10-01

    Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.

  15. 在搜索引擎中增加新网站的方法%The Method to Increase New Web Site in the Search Engine

    Institute of Scientific and Technical Information of China (English)

    方兰平; 杨晓梅

    2001-01-01

    Several search engines in common use at present in this text are the example,and the method that increases the new web site in the search engine has been briefly introduced.%以目前常用的几个搜索引擎为例,对怎样在搜索引擎中增加新网站的方法作了一些简介。

  16. Web Image Search Re-ranking with Click-based Similarity and Typicality.

    Science.gov (United States)

    Yang, Xiaopeng; Mei, Tao; Zhang, Yong Dong; Liu, Jie; Satoh, Shin'ichi

    2016-07-20

    In image search re-ranking, besides the well known semantic gap, intent gap, which is the gap between the representation of users' query/demand and the real intent of the users, is becoming a major problem restricting the development of image retrieval. To reduce human effects, in this paper, we use image click-through data, which can be viewed as the "implicit feedback" from users, to help overcome the intention gap, and further improve the image search performance. Generally, the hypothesis visually similar images should be close in a ranking list and the strategy images with higher relevance should be ranked higher than others are widely accepted. To obtain satisfying search results, thus, image similarity and the level of relevance typicality are determinate factors correspondingly. However, when measuring image similarity and typicality, conventional re-ranking approaches only consider visual information and initial ranks of images, while overlooking the influence of click-through data. This paper presents a novel re-ranking approach, named spectral clustering re-ranking with click-based similarity and typicality (SCCST). First, to learn an appropriate similarity measurement, we propose click-based multi-feature similarity learning algorithm (CMSL), which conducts metric learning based on clickbased triplets selection, and integrates multiple features into a unified similarity space via multiple kernel learning. Then based on the learnt click-based image similarity measure, we conduct spectral clustering to group visually and semantically similar images into same clusters, and get the final re-rank list by calculating click-based clusters typicality and withinclusters click-based image typicality in descending order. Our experiments conducted on two real-world query-image datasets with diverse representative queries show that our proposed reranking approach can significantly improve initial search results, and outperform several existing re-ranking approaches.

  17. Top-d Rank Aggregation in Web Meta-search Engine

    Science.gov (United States)

    Fang, Qizhi; Xiao, Han; Zhu, Shanfeng

    In this paper, we consider the rank aggregation problem for information retrieval over Web making use of a kind of metric, the coherence, which considers both the normalized Kendall-τ distance and the size of overlap between two partial rankings. In general, the top-d coherence aggregation problem is defined as: given collection of partial rankings Π = {τ 1,τ 2, ⋯ , τ K }, how to find a final ranking π with specific length d, which maximizes the total coherence Φ(π,Pi)=sum_{i=1}^K Φ(π,tau_i). The corresponding complexity and algorithmic issues are discussed in this paper. Our main technical contribution is a polynomial time approximation scheme (PTAS) for a restricted top-d coherence aggregation problem.

  18. 3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.

    Science.gov (United States)

    Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke

    2014-01-01

    The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.

  19. Disponibilização do catálogo do acervo das bibliotecas da UNICAMP na web, utilizando o altavista search intranet

    Directory of Open Access Journals (Sweden)

    Mariângela Pisoni Zanaga

    Full Text Available Desenvolvimento e implantação de projeto, visando a disponibilização do catálogo automatizado de monografias (livros e teses, existente nas bibliotecas da UNICAMP, na WEB, utilizando a ferramenta de busca AltaVista Search Intranet.

  20. FirstSearch and NetFirst--Web and Dial-up Access: Plus Ca Change, Plus C'est la Meme Chose?

    Science.gov (United States)

    Koehler, Wallace; Mincey, Danielle

    1996-01-01

    Compares and evaluates the differences between OCLC's dial-up and World Wide Web FirstSearch access methods and their interfaces with the underlying databases. Also examines NetFirst, OCLC's new Internet catalog, the only Internet tracking database from a "traditional" database service. (Author/PEN)

  1. Collab-Analyzer: An Environment for Conducting Web-Based Collaborative Learning Activities and Analyzing Students' Information-Searching Behaviors

    Science.gov (United States)

    Wu, Chih-Hsiang; Hwang, Gwo-Jen; Kuo, Fan-Ray

    2014-01-01

    Researchers have found that students might get lost or feel frustrated while searching for information on the Internet to deal with complex problems without real-time guidance or supports. To address this issue, a web-based collaborative learning system, Collab-Analyzer, is proposed in this paper. It is not only equipped with a collaborative…

  2. The PPI3D web server for searching, analyzing and modeling protein-protein interactions in the context of 3D structures.

    Science.gov (United States)

    Dapkūnas, Justas; Timinskas, Albertas; Olechnovič, Kliment; Margelevičius, Mindaugas; Dičiūnas, Rytis; Venclovas, Česlovas

    2016-12-22

    The PPI3D web server is focused on searching and analyzing the structural data on protein-protein interactions. Reducing the data redundancy by clustering and analyzing the properties of interaction interfaces using Voronoi tessellation makes this software a highly effective tool for addressing different questions related to protein interactions.

  3. A Study on Information Search and Commitment Strategies on Web Environment and Internet Usage Self-Efficacy Beliefs of University Students'

    Science.gov (United States)

    Geçer, Aynur Kolburan

    2014-01-01

    This study addresses university students' information search and commitment strategies on web environment and internet usage self-efficacy beliefs in terms of such variables as gender, department, grade level and frequency of internet use; and whether there is a significant relation between these beliefs. Descriptive method was used in the study.…

  4. FirstSearch and NetFirst--Web and Dial-up Access: Plus Ca Change, Plus C'est la Meme Chose?

    Science.gov (United States)

    Koehler, Wallace; Mincey, Danielle

    1996-01-01

    Compares and evaluates the differences between OCLC's dial-up and World Wide Web FirstSearch access methods and their interfaces with the underlying databases. Also examines NetFirst, OCLC's new Internet catalog, the only Internet tracking database from a "traditional" database service. (Author/PEN)

  5. 基于演化版本的 Deep Web 查询接口维护方法%Deep Web search interface maintenance method based on evolution version

    Institute of Scientific and Technical Information of China (English)

    束长波; 施化吉; 王基

    2015-01-01

    针对现有 Deep Web 信息集成系统没有考虑查询接口动态性的特点,造成本地接口与网络接口查询能力不对等的问题,提出一种基于演化版本的 Deep Web 查询接口维护方法。该方法通过构建本地接口的版本化模型来刻画接口的增量变化,识别变动比较活跃的属性集合;然后采取试探性查询来构建最优查询语句,获取网络接口数据源的变动信息,演化出本地接口的下一个版本,实现对本地查询接口数据源的信息维护的迭代过程。实验结果表明,该方法降低了深网环境变化对 Deep Web 信息集成带来的影响,确保了 Deep Web 查询接口的准确率和查全率的稳定性。%In order to solve the problems existed in the traditional Deep Web information integration system that without con-sidering the dynamic feature of search interface,causing local interface and network interface query ability is not equal.There-fore,this paper proposed a Deep Web search interface maintenance method based on evolution version.In this method,con-structing the version models of local search interface was to express the incremental change of it,and to extract the active attrib-ute set.Next,generating the best query string with the set and probing query was to extract the change content and get the next version of local interface.Finally,it could realize the iterative maintenance of local search interface data source.The experi-mental results show that this method is able to decrease the impact caused by deep Web network changing,and keep the recall and precision of Deep Web search interface in a stable state.

  6. Study of Search Engine Transaction Logs Shows Little Change in How Users use Search Engines. A review of: Jansen, Bernard J., and Amanda Spink. “How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Information Processing & Management 42.1 (2006: 248‐263.

    Directory of Open Access Journals (Sweden)

    David Hook

    2006-09-01

    Full Text Available Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American over a seven‐year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one term queries, operator (and, or, not, etc. usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.‐based or the European‐based search engines. As well, there was little change observed in the percentage of one‐term queries over the years of study for either the U.S.‐based or the European‐based search engines. Few users (generally less than 20% use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.‐based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, oneterm sessions and operator usage remained stable. The increase in single result page viewing

  7. Web搜索引擎技术研究%Research on Web Search Engine Technology

    Institute of Scientific and Technical Information of China (English)

    申健; 柴艳娜

    2016-01-01

    Information in Internet is exponential growth with the development of science and technology. There should be a tool to help users to manage the big data effectively and get the useful information what they want,and locate and index information quickly and prop-erly,which is the target of search engine,and why search engine has been an essential tool in daily life. The search engine technologies are researched and their internal principle and mechanism are discussed,and their technical architecture and the information retrieval are ana-lyzed. In the working principle,the relative algorithm and strategy is studied. At the same time,the core technology and algorithm adopted by Google’ s search engine are studied and compared with the traditional technology,analyzing their superiority. In addition,the indexes and SEO the search engine working process involves are discussed respectively. It is pointed out that the information retrieval tools are im-portant for huge amounts of information processing and advanced in information retrieval,the development of which will drive the pro-gress of information science.%科技的进步导致了互联网中的信息以指数级速度增长。如何有效地管理和组织信息,帮助用户在海量的信息里获取有用的信息,并快速定位和索引,既是搜索引擎的目标,也是搜索引擎能够成为网络用户不可或缺的基础工具的原因。对搜索引擎技术进行了研究,讨论其内在原理和运行机制,分析其技术架构和信息抓取方法,并从工作原理上对其采用的算法和策略进行了分析。同时,对实际中Google搜索引擎所采用的核心技术和算法进行研究并与传统技术进行了对比,分析其所具备的先进性。另外,对搜索引擎工作流程涉及到的索引问题、SEO等都分别进行了探讨。指出信息检索工具对于海量信息数据处理的重要性,以及在信息检索方面搜索引擎体现的优越性

  8. Web search and data mining of natural products and their bioactivities in PubChem.

    Science.gov (United States)

    Ming, Hao; Tiejun, Cheng; Yanli, Wang; Stephen, Bryant H

    2013-10-01

    Natural products, as major resources for drug discovery historically, are gaining more attentions recently due to the advancement in genomic sequencing and other technologies, which makes them attractive and amenable to drug candidate screening. Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost. Lately, a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products. Thus, it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment. PubChem, as a public database, contains large amounts of natural products associated with bioactivity data. In this review, we introduce the information system provided at PubChem, and systematically describe the applications for a set of PubChem web services for rapid data retrieval, analysis, and downloading of natural products. We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.

  9. RL_Spider: AN INDEPENDENT VERTICAL SEARCH ENGINE WEB CRAWLER%RL_Spider:一种自主垂直搜索引擎网络爬虫

    Institute of Scientific and Technical Information of China (English)

    黄蔚; 刘忠; 刘全

    2011-01-01

    在分析相关spider技术的基础上,提出了将强化学习技术应用到垂直搜索引擎的可控网络爬虫方法.该方法通过强化学习技术得到一些控制“经验信息”,根据这些信息来预测较远的回报,按照某一主题进行搜索,以使累积返回的回报值最大.将得到的网页存储、索引,用户通过搜索引擎的搜索接口,就可以得到最佳的搜索结果.对多个网站进行主题爬虫搜索,实验结果表明,该方法对于网络的查全率和查准率都具有较大的提高.%Based on the analysis of related spider techniques, the approach for applying reinforcement learning technology to controllable web crawler of vertical search engine is proposed in the paper. It predicts the future reward based on some control " experience information" obtained through reinforcement learning, focuses on specific topic search to maximise the accumulated returned reward value. By storing and indexing the searched web pages, users can search through search interface provided by search engine to gain the optimal search results. The topic crawler searches have been executed on various websites, experimental results show the obvious enhancement in the recall and precision of the web.

  10. Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)

    Science.gov (United States)

    Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn

    2009-01-01

    During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either

  11. 语义Web环境下文献搜索引擎功能的研究%Research on the literature search engine function of semantic Web environment

    Institute of Scientific and Technical Information of China (English)

    袁辉; 李延香

    2013-01-01

    Based on solving the problem of when Handle user submitted the key word the traditional information retrieval technology Will contain key pages returned to the user, and contain key pages returned to users, we use demand irrelevant web page also return together with the purpose of this problem, the semantic web environment design in literature search engine search method, through the semantic web information retrieval system design and implementation of the test, it is concluded that the semantic web can provide users of the key word to logical understanding, achieve more accurate search function of the conclusion.%基于解决传统信息检索技术的搜索引擎在处理用户提交的关键字时,将含有关键字的页面返还给用户,与用户需求无关的网页也一并返还这一问题的目的,采用了在语义网环境下设计文献搜索引擎进行检索的方法,通过对语义网文献检索系统的设计与实现的试验,得出语义网可将用户提供的关键字进行逻辑理解,实现更加精准的搜索功能的结论.

  12. Web服务搜索引擎的WSRank方法研究%Research on the WSRank Method for the Web Service Search Engine

    Institute of Scientific and Technical Information of China (English)

    胡蓉; 刘建勋

    2011-01-01

    Web service retrieval difficulties hampers the speed of its application and development.After realizing a web service search engine named WSSE, how to rank the services becomes the focus.The novel WSRank algorithm is advanced through analyzing the web services distribution structure and mutual relationship based on the crawling features, and learning from the famous PageRank Algorithm and its improved research achievements.The rank values are iteratively calculated, and then the web services are sorted by the values in a non-increasing order.The experiments show that the algorithm can improve the accuracy of Web service search.%Web服务检索的困难阻碍了其应用和发展的速度.在实现了一个Web服务搜索引擎WSSE后,服务的排序成为需要解决的问题.通过Web服务爬虫的爬行特点分析Web服务的分布结构和相互关系,借鉴著名的网页排序算法PageRank及其改进算法的研究成果,创新地提出WSRank算法.迭代计算各服务的排序值,按值进行非递增排序.实验表明,本算法能提高Web服务检索的准确性.

  13. Use of the DISCERN tool for evaluating web searches in childhood epilepsy.

    Science.gov (United States)

    Cerminara, Caterina; Santarone, Marta Elena; Casarelli, Livia; Curatolo, Paolo; El Malhany, Nadia

    2014-12-01

    Epilepsy is an important cause of neurological disability in children. Nowadays, an increasing number of parents or caregivers use the Internet as a source of health information concerning symptoms, therapy, and prognosis of epilepsy occurring during childhood. Therefore, high-quality websites are necessary to satisfy this request. Using the DISCERN tool, we evaluated online information on childhood epilepsy provided by the first 50 links displayed on the Google search engine. The same links were evaluated by a team of pediatric neurologists (PNs) and by a lay subject (LS). The evaluation performed by the PNs found out that only 9.6% of the websites showed good reliability, that only 7.2% of the websites had a good quality of information on treatment choices, and that only 21.5% of the websites showed good overall quality of the content. With regard to the evaluation performed by the neutral subject, it was found that 21.4% of the websites showed good reliability, that 59.5% of the websites showed poor quality of information on treatment choices, and that only 2% of the websites showed good overall quality of the content. Our conclusion is that online information about childhood epilepsy still lacks reliability, accuracy, and relevance as well as fails to provide a thorough review of treatment choices.

  14. 基于元搜索的网页去重算法%An algorithm of duplicated web pages detection based on meta-search engine

    Institute of Scientific and Technical Information of China (English)

    张玉连; 王莎莎; 宋桂江

    2011-01-01

    针对元搜索的重复网页问题,提出基于元搜索的网页去重算法,并通过实验对算法进行有效性验证.该算法首先对各成员搜索引擎返回来的结果网页的URL进行比较,然后对各结果网页的标题进行有关处理,提取出网页的主题信息,再对摘要进行分词,计算摘要的相似度,三者结合能很好的检测出重复网页,实现网页去重.该算法有效,并且比以往算法有明显的优势,更接近人工统计结果.%According to the duplicated web pages returning from meta-search engine, an algorithm of deletion of duplicated web pages based on meta-search engine is proposed.The effectiveness of the algorithm is verified through experiments.Firstly, the URL ofresult web pages is compared, which is retum by single search engines.Secondly, the titles of result web pages are processed,and thematic information of pages is extracted.Finally, the word segmentation on the summary is canied out, and the similarity of the summary is calculated.By combining these, the algorithm is able to test the duplicated web pages, realize the goal of deletion of duplicated web pages.Compared with the previous algorithms, the algorithm has obvious advantages and is closer to artificial results.

  15. Using Web-Based Search Data to Study the Public’s Reactions to Societal Events: The Case of the Sandy Hook Shooting

    Science.gov (United States)

    2017-01-01

    Background Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. Objective The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public’s reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Methods Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. Results A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to “guns” (+50.06%), “shooting incident” (+333.71%), “ammunition” (+155.14%), and “gun-related laws” (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following “shooting incident” queries whereas searches for “guns” (+61.02%) and “ammunition” (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Conclusions Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development. PMID:28336508

  16. 主题搜索引擎中网络蜘蛛搜索策略的研究%Study of Search Strategy in Topic -oriented Web Spider for Topic- driven Search Engine

    Institute of Scientific and Technical Information of China (English)

    王明国; 胡敬仓

    2011-01-01

    主题网络蜘蛛的搜索策略是主题搜索引擎的核心部分,是近年来主题搜索引擎研究中的热点问题之一.深入研究了主题网络蜘蛛的关键技术,阐述了多线程网络蜘蛛的实现过程,并对传统的VSM算法和PageRank算法进行了改进,提高了主题网络蜘蛛采集信息的有效性和准确性.%The search strategy in topic -oriented web spider is the key component of topic -driven search engine, and is really hot in research in recent years. Based on in - depth research in the search strategy in topic - driven search engine, this article expounds the development processes of the multi -thread web spider, at the same time, improves the traditional VSM algorithm and the PageRank algorithm, and enhances the validation and accuracy of information collection for topic - oriented web spider.

  17. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis....

  18. Discussion on Web testing technology based on academic search engine websites%基于学术搜索引擎网站浅谈Web测试

    Institute of Scientific and Technical Information of China (English)

    杨小萍; 李德录; 王昱

    2011-01-01

    Web测试是软件测试的一部分,Web测试方法日新月异,基于对学术搜索引擎网站进行测试,文中主要介绍Web测试的相关知识,从功能测试、性能测试、用户界面测试3个方面进行了阐述,对Web测试常用工具分析对比,系统的探讨了Web测试的过程和常用方法.%Web testing methods, as a part of software testing, have been changing rapidly. Based on testing academic search engine websites, this article mainly introduces relevant knowledge on Web testing , including function testing, performance testing, and user interface testing. Also, it analyses and compares among general testing tools and discusses the process of Web testing and commonly used testing methods.

  19. The effect of patient narratives on information search in a web-based breast cancer decision aid: an eye-tracking study.

    Science.gov (United States)

    Shaffer, Victoria A; Owens, Justin; Zikmund-Fisher, Brian J

    2013-12-17

    Previous research has examined the impact of patient narratives on treatment choices, but to our knowledge, no study has examined the effect of narratives on information search. Further, no research has considered the relative impact of their format (text vs video) on health care decisions in a single study. Our goal was to examine the impact of video and text-based narratives on information search in a Web-based patient decision aid for early stage breast cancer. Fifty-six women were asked to imagine that they had been diagnosed with early stage breast cancer and needed to choose between two surgical treatments (lumpectomy with radiation or mastectomy). Participants were randomly assigned to view one of four versions of a Web decision aid. Two versions of the decision aid included videos of interviews with patients and physicians or videos of interviews with physicians only. To distinguish between the effect of narratives and the effect of videos, we created two text versions of the Web decision aid by replacing the patient and physician interviews with text transcripts of the videos. Participants could freely browse the Web decision aid until they developed a treatment preference. We recorded participants' eye movements using the Tobii 1750 eye-tracking system equipped with Tobii Studio software. A priori, we defined 24 areas of interest (AOIs) in the Web decision aid. These AOIs were either separate pages of the Web decision aid or sections within a single page covering different content. We used multilevel modeling to examine the effect of narrative presence, narrative format, and their interaction on information search. There was a significant main effect of condition, P=.02; participants viewing decision aids with patient narratives spent more time searching for information than participants viewing the decision aids without narratives. The main effect of format was not significant, P=.10. However, there was a significant condition by format interaction on

  20. W eb垂直搜索引擎实现过程的研究%Research on implementation process of Web vertical search engine

    Institute of Scientific and Technical Information of China (English)

    张弘弦; 田玉玲

    2016-01-01

    Web垂直搜索引擎是一个复杂的信息系统,目前大多数研究都集中在解决搜索引擎中出现的某一个方面的问题,仍缺乏对Web垂直搜索引擎完整实现过程的相关研究。针对这个问题,提出一种三层架构的Web垂直搜索引擎的实现过程,整个过程包含数据准备、查询处理和界面交互。使用Java语言和相关的开源工具,对实现过程描述的具体任务进行实际操作,实现了一个查询手机信息的Web垂直搜索引擎。该三层架构和实现过程有效地为构建面向主题的完整Web垂直搜索引擎提供了理论依据和实践指导。%The Web search engine is a complex information system. However,most researches are concentrated on one de⁃tailed problem appearing in a aspect of the search engine,but they lack of the correlational research on the complete implemen⁃tation process of Web vertical search engines. Aiming at this problem,the implementation process of a Web vertical search en⁃gine with a three⁃layer architecture is proposed,in which data preparation,query processing and interface interaction are con⁃tained. An actual operation of a certain task describing the implementation process was performed with Java platform and relative open⁃source tools. And by this operation,the Web vertical search engine that could query mobile information was realized. The three⁃layer architecture and implementation process provide a theoretical basis and practical guidance for building a complete subject⁃oriented Web vertical search engine.

  1. Study on Information Search Model Based on Semantic Web Services%基于语义Web服务的信息检索模型研究

    Institute of Scientific and Technical Information of China (English)

    李志强

    2011-01-01

    In order to resolve the lack of semantic information of traditional keyword-based information search method, this paper puts forward the information search model based on semantic Web services in distributed network environment, on the basis of the description on key technologies of semantic Web services. Through analysis of the functions of the model, it proposes the information search mechanism based on semantic similarity, and provides the solution in order to implement integration and sharing on information resources of heterogeneous systems. Finally it illustrates the implementation of information search prototype system based on semantic Web services, and makes performance analysis by simulated experiment. As a result, this paper provides a solution in order to achieve automatic and intelligent information search.%为解决传统基于关键词的信息检索机制的语义信息缺失问题,在对语义Web和Web服务关键技术描述的基础上,本文提出分布式网络环境下基于语义Web服务的信息检索模型.通过对模型中每一层功能的分析,提出基于语义相似度的信息检索机制,并为实现异构系统的信息集成和共享提供解决方案.最后实现基于语义Web服务的信息检索原型系统,并通过仿真实验进行性能分析.结果证明,本文为实现自动化与智能化信息检索提供一种较好的解决方案.

  2. 基于Web 2.0的综合搜索引擎%SYNTHESIZED SEARCH ENGINE BASED ON WEB 2.0

    Institute of Scientific and Technical Information of China (English)

    程陈; 齐开悦; 陈剑波

    2010-01-01

    Web 2.0的出现使网络中的信息量呈井喷局势,给搜索引擎带来了新的挑战,目前的搜索引擎已经不能满足大多数用户的需求.针对这种情况,首先分析了当前搜索引擎的现状和优缺点,其次针对新的情况下的用户需求作出分析,参考和利用了当前Web 2.0的一些先进技术,提出一个基于Web 2.0社区的综合搜索引擎.

  3. 用Delphi和Google Web API开发自己的搜索引擎%Creating Your Own Search Engine with Delphi and Google Web API

    Institute of Scientific and Technical Information of China (English)

    任树怀; 孙桂春

    2004-01-01

    简述Delphi是Google Web API提供的一种Web服务,允许开发人员选择自己喜爱的编程语言开发应用程序,通过基于XML的SOAP信息交换协议与远程的Google服务器连接来调用Google的Web服务;通过实例详细介绍用Delphi和Google Web API开发搜索引擎的方法和步骤.

  4. 面向Web零件库的可拓关联搜索%Adjustable relevance search in Web-based parts library

    Institute of Scientific and Technical Information of China (English)

    顾复; 张树有

    2011-01-01

    To solve the data heterogeneity and information immensity problems in the web-based parts libraries, Adjustable Relevance Search (ARS) oriented to web-based parts library was put forward. Resource Description Framework Schema(RDFS) was used to construct ontology model for parts' information resources as nodes. Nodes were connected with each other by various semantic relationships to form semantic-Web so as to realize extended relevance search. Based on this semantic-Web model, the working process and algorithm of ARS were put forward and explained. The feasibility and practicality of ARS in Web-based parts library based on semantic-network was demonstrated by a simple programming example designed for injection molding machine.%针对由于Web零件库信息量大、数据的异构性强而出现的问题,提出面向Web零件库的可拓关联搜索.用资源描述框架主题建立零部件信息资源的本体模型,并作为语义网络中的节点;通过语义关系将各节点连接成一个语义网络,进而实现扩展关联搜索.基于该语义网络模型,提出了可拓关联搜索的流程与算法.通过针对注塑机零部件的Web零件库编程实例,验证了可拓关联搜索在Web零件库中的可行性与实用性.

  5. 隐蔽网络资源的检索工具%Searching Aids of Invisible Web Resource

    Institute of Scientific and Technical Information of China (English)

    张蕾

    2006-01-01

    本文介绍了隐蔽网络(Invisible Web)的概念、规模、特征、种类以及检索隐蔽网络的4种途径,即目录指南、搜索"Invisible Web"的网站、Invisible Web数据库和搜索引擎.

  6. Categorical and Specificity Differences between User-Supplied Tags and Search Query Terms for Images. An Analysis of "Flickr" Tags and Web Image Search Queries

    Science.gov (United States)

    Chung, EunKyung; Yoon, JungWon

    2009-01-01

    Introduction: The purpose of this study is to compare characteristics and features of user supplied tags and search query terms for images on the "Flickr" Website in terms of categories of pictorial meanings and level of term specificity. Method: This study focuses on comparisons between tags and search queries using Shatford's categorization…

  7. EPA Web Taxonomy

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA's Web Taxonomy is a faceted hierarchical vocabulary used to tag web pages with terms from a controlled vocabulary. Tagging enables search and discovery of EPA's...

  8. A Research Framework of Web Search Engine Usage Mining%Web搜索引擎日志挖掘研究框架

    Institute of Scientific and Technical Information of China (English)

    王继民; 李雷明子; 孟涛

    2011-01-01

    Log files of search engines record the interactive procedure between users and the system completely. Mining the logs can help us to discover the characteristics of user behaviors and to improve the performance of search systems. This paper gives a framework on Web search engine usage mining, which includes the choice of data collections, the methods of data preprocessing, and an analysis and comparison of search behaviors from different countries. We also explore its applications on improving the effectiveness and efficiency of search engines.%搜索引擎日志记录了用户与系统交互的整个过程.对日志文件进行挖掘,可以发现用户进行Web搜索的行为特征与规律,有效改善搜索引擎系统的性能.在对国内外相关研究进行系统梳理和总结的基础上,文章提出了一个Web搜索引擎日志挖掘的研究框架,主要包括日志挖掘的研究内容、数据集的选择方法、数据预处理的方法、不同地域用户行为的特征与比较、如何应用于系统性能的改善等内容.

  9. Children's Search Engines from an Information Search Process Perspective.

    Science.gov (United States)

    Broch, Elana

    2000-01-01

    Describes cognitive and affective characteristics of children and teenagers that may affect their Web searching behavior. Reviews literature on children's searching in online public access catalogs (OPACs) and using digital libraries. Profiles two Web search engines. Discusses some of the difficulties children have searching the Web, in the…

  10. On Intensified Search Function of ISI Web of Science%ISI Web of Science强化检索功能的探讨

    Institute of Scientific and Technical Information of China (English)

    夏立娟; 王唯玮

    2007-01-01

    美国科学情报研究所推出的Web of Science近年来不断升级改版,其5.0版增加了组合检索、高级检索、检索历史等方式.通过巧妙组合,能够实现以前所不能实现的检索功能.文章对这些强化的检索功能加以介绍并举例说明.

  11. 基于IA和Web Service的FTP搜索引擎设计%Design of FTP Search Engine Based on IA and Web Service Technology

    Institute of Scientific and Technical Information of China (English)

    龚达

    2003-01-01

    设计了一个基于IA技术和Web Service技术的FTP搜索引擎.它在搜索服务器间实现了某种程度的peer to peer模式的分布,并可在有防火墙的环境中进行数据库的更新和任务委托;通过Spider技术实现信息的自动收集与分类整理,同时利用分词字典降低用户请求的检索粒度.

  12. B2C垂直搜索引擎的网络爬虫设计%Design of Web Crawlers for B2C Vertical Search Engines

    Institute of Scientific and Technical Information of China (English)

    杨亮; 刘利伟; 胡华莲

    2013-01-01

    A vertical search engine web crawler system based on Beautiful-Soup information extraction technique is developed. Experiment results indicate that the effective rate of information extraction is up to 95%and can meet commercial requirements.%  开发了基于 Beautiful-Soup 信息提取的 B2C 类垂直搜索引擎爬虫系统。测试结果表明:该爬虫的抓取有效率实际达到95%以上,满足商业应用的要求。

  13. Google Ajax Search API

    CERN Document Server

    Fitzgerald, Michael

    2007-01-01

    Use the Google Ajax Search API to integrateweb search, image search, localsearch, and other types of search intoyour web site by embedding a simple, dynamicsearch box to display search resultsin your own web pages using a fewlines of JavaScript. For those who do not want to write code,the search wizards and solutions builtwith the Google Ajax Search API generatecode to accomplish common taskslike adding local search results to a GoogleMaps API mashup, adding videosearch thumbnails to your web site, oradding a news reel with the latest up todate stories to your blog. More advanced users can

  14. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  15. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  16. DESIGN OF A WEB SEMI-INTELLIGENT METADATA SEARCH MODEL APPLIED IN DATA WAREHOUSING SYSTEMS DISEÑO DE UN MODELO SEMIINTELIGENTE DE BÚSQUEDA DE METADATOS EN LA WEB, APLICADO A SISTEMAS DATA WAREHOUSING

    Directory of Open Access Journals (Sweden)

    Enrique Luna Ramírez

    2008-12-01

    Full Text Available In this paper, the design of a Web metadata search model with semi-intelligent features is proposed. The search model is oriented to retrieve the metadata associated to a data warehouse in a fast, flexible and reliable way. Our proposal includes a set of distinctive functionalities, which consist of the temporary storage of the frequently used metadata in an exclusive store, different to the global data warehouse metadata store, and of the use of control processes to retrieve information from both stores through aliases of concepts.En este artículo se propone el diseño de un modelo para la búsqueda Web de metadatos con características semiinteligentes. El modelo ha sido concebido para recuperar de manera rápida, flexible y fiable los metadatos asociados a un data warehouse corporativo. Nuestra propuesta incluye un conjunto de funcionalidades distintivas consistentes en el almacenamiento temporal de los metadatos de uso frecuente en un almacén exclusivo, diferente al almacén global de metadatos, y al uso de procesos de control para recuperar información de ambos almacenes a través de alias de conceptos.

  17. Research and Implementation of Web Search Engine Based on Lucene%基于Lucene的Web搜索引擎的研究和实现

    Institute of Scientific and Technical Information of China (English)

    周凤丽; 林晓丽

    2012-01-01

    Search engine has made a constant development with the development of the internet, but its gradual shifting to commercial operation makes the technical details of search engine more and more hidden. Based on research and analysis of the system structure,model and indexer of Lucene,it implements a search engine system,this system uses a non-recursive mode to take responsibility for Web crawling in the Web and distributing ,handling of URL links in the process of crawling.it manages multiple crawling threads by multi-threading technology,implements concurrently Web pages crawling and improves the system operating efficiency. And then,use JSP technology to design a simple news search engine clients. The system can run stable in line which achieves the search engine's principles and has certain significance.%互联网的快速发展也使搜索引擎不断的发展着,而搜索引擎逐渐转向商业化运行,使得搜索引擎的技术细节越来越隐蔽.文章研究和分析了搜索引擎工具Lucene的原理、模型和索引器,设计了一个搜索引擎系统.该系统采用了非递 归的方式负责Web站点的网页爬取以及爬取过程中URL链接的存储、处理等,并通过多线程技术管理多个抓取线程,实现了并发抓取网页,提高了系统的运行效率.最后采用JSP技术设计了一个简易的新闻搜索引擎客户端,系统可以稳定运行,基本符合搜索引擎原理的探索,具有一定的现实意义.

  18. RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

    Directory of Open Access Journals (Sweden)

    Wasik Szymon

    2010-05-01

    Full Text Available Abstract Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA

  19. 基于DC元数据的Web网络搜索引擎系统设计%Design about WEB Search Engine System Based on DC Metadata

    Institute of Scientific and Technical Information of China (English)

    伯琼; 胡飞; 钟国祥

    2011-01-01

    传统搜索引擎通常靠抓取全文关键词进行分析,由此带来三大缺陷:缺乏语义描述导致查准率低;检索结果冗余模糊导致检索效率低;检索途径不足.基于DC元数据描述网络资源的优越性,课题组设计了一个基于DC元数据的网络搜索引擎系统DCSE,力图克服传统搜索引擎的上述缺陷.DCSE系统自动抓取含DC描述的Web网页,把DC描述信息存入到数据库,排序索引后提供用户检索.检索界面设计成以15个DC元素为检索项的多项逻辑组合检索,检索结果以各DC元素的描述内容来显示,如标题、创建者、描述、日期等.用户通过多项组合检索提高查准率,并通过清晰的结果显示对所需信息做出快速判断、选择,从而达到提高检索效率的目的.%The traditional search engines analyze key words that are snatched in the full text, leading to three flaws:lowly precision ratio; bad search efficiency, lacking search channel. Based on advantage of describing Web resource by DC metadata, the seminar designed a DCSE system in order to overcome flaws of traditional search engine. DCSE grasped automatically WEB page and storaged DC describe information to database that supplied for user following index. Search interface was designed as multiple logical combinations with various retrieval based on 15 DC elements, the retrieval results are displayed with discribled content of DC element, such as title, creator, description, date , etc. The user can raise the precision ratio by multiterm assemble search and can select rapidly needed information by display clearly search result,thereby raise retrieval efficiency.

  20. Web Personalization Using Web Mining

    Directory of Open Access Journals (Sweden)

    Ms.Kavita D.Satokar,

    2010-03-01

    Full Text Available The information on the web is growing dramatically. The users has to spend lots of time on the web finding the information they are interested in. Today, he traditional search engines do not give users enough personalized help but provide the user with lots of irrelevant information. In this paper, we present a personalize Web searchsystem, which can helps users to get the relevant web pages based on their selection from the domain list. Thus, users can obtain a set of interested domains and the web pages from the system. The system is based on features extracted from hyperlinks, such as anchor terms or URL tokens. Our methodology uses an innovative weighted URL Rank algorithm based on user interested domains and user query.

  1. 基于可信度的Web信息搜索主动服务研究%Study of Active Web Information Search Based on Credibility

    Institute of Scientific and Technical Information of China (English)

    肖婷; 陈红英

    2011-01-01

    现有的Web信息搜索方式是基于关键词匹配来进行的,其准确性和可靠性有限.本文以用户需求为中心,收集用户偏好,应用后台软件采用C4.5决策树算法构建文件过滤规则,结合基于主观Bayes方法的不确定性推理为过滤规则追加可信度支持,并用模糊规则来描述,确保Web信息搜索的查全率和查准率全面提升.%Existing Web information search method is carried out based on keyword matching, the accuracy and reliability is limited. This paper focuses on user's needs, uses background software building file filtering rule by the application of decision tree C4. 5 algorithm. The file filtering rule combines with uncertainty reasoning way based on subjective Bayes for the additional credibility support, describes as the fuzzy rule to improve the recall and precision of Web information search.

  2. Evaluating search effectiveness of some selected search engines ...

    African Journals Online (AJOL)

    Evaluating search effectiveness of some selected search engines. ... AFRICAN JOURNALS ONLINE (AJOL) · Journals · Advanced Search · USING AJOL ... seek for information on the World Wide Web (WWW) using variety of search engines.

  3. Research and Design of Topical Crawl Module Based on Deep Web Search Technology%基于Deep Web Search技术的主题式爬虫模块研究与设计

    Institute of Scientific and Technical Information of China (English)

    孟敬; 刘寿强

    2011-01-01

    随着Web技术的飞速发展,海量数据的管理与搜索变得尤为重要.海量信息的异构性和动态性特点要求信息集成需要Web爬虫来自动获取这些页面,以便进一步处理数据.而一些企业内部的资料既要保密又要供不同的内部职员使用,这种既开放又保守的特点成为企业发展的瓶颈.为了帮助用户完成这样的任务,本文改变传统的资源共享形式,为企业提供了一个高效便利保密的资源共享管理平台--企业搜索引擎(ESE),提出了一种基于主题式爬虫的Deep Web页面的企业搜索引擎(ESE)的和基于开源Java Lucene的索引企业搜索系统设计与实现方法.通过在电信行业Deep Web站点部署实验,经运行检验,结果达到了设计指标要求,为电信行业搜索发挥了作用.并对搜索的精度、速度,以及垃圾网页反舞弊等方面研究进行了展望.%As the web rapidly grows, massive data management and search becomes particularly important. Heterogeneous mass information and dynamic characteristics of information integration require Web crawlers to automatically access these Web pages in order to further process the data, the internal confidential information of enterprises must be only used by different internal staffs, the openness and conservative features become the major bottleneck for the enterprise development. To help out this task, some forms of the traditional resource sharing are changed, an efficient, convenient, and confidential resource sharing management platform-Enterprise Search Engine (ESE) is provided, and the design and implementation method for Deep Web ESE based on topical crawl and indexed enterprise search systems based on open source Java Lucene is proposed. After the deployment and experiment of Deep Web site in the telecommunications industry, the results are proved to meet the design target. It plays an important role in the telecommunications industry. Finally, the studies on the search

  4. Use of Web 2.0 Technologies in K-12 and Higher Education: The Search for Evidence-Based Practice

    Science.gov (United States)

    Hew, Khe Foon; Cheung, Wing Sum

    2013-01-01

    Evidence-based practice in education entails making pedagogical decisions that are informed by relevant empirical research evidence. The main purpose of this paper is to discuss evidence-based pedagogical approaches related to the use of Web 2.0 technologies in both K-12 and higher education settings. The use of such evidence-based practice would…

  5. Retrieval of very large numbers of items in the Web of Science: an exercise to develop accurate search strategies

    NARCIS (Netherlands)

    Arencibia-Jorge, R.; Leydesdorff, L.; Chinchilla-Rodríguez, Z.; Rousseau, R.; Paris, S.W.

    2009-01-01

    The Web of Science interface counts at most 100,000 retrieved items from a single query. If the query results in a dataset containing more than 100,000 items the number of retrieved items is indicated as >100,000. The problem studied here is how to find the exact number of items in a query that lead

  6. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    Science.gov (United States)

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  7. Self-identification of occupation in web surveys: requirements for search trees and look-up tables

    NARCIS (Netherlands)

    Tijdens, K.

    2015-01-01

    Can self-identification of occupation be applied in web surveys by using a look-up table with coded occupational titles, in contrast to other survey modes where an open format question with office-coding has to be applied? This article is among the first to explore this approach, using a random samp

  8. The Opera del Vocabolario Italiano Database: Full-Text Searching Early Italian Vernacular Sources on the Web.

    Science.gov (United States)

    DuPont, Christian

    2001-01-01

    Introduces and describes the functions of the Opera del Vocabolario Italiano (OVI) database, a powerful Web-based, full-text, searchable electronic archive that contains early Italian vernacular texts whose composition may be dated prior to 1375. Examples are drawn from scholars in various disciplines who have employed the OVI in support of their…

  9. ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

    Science.gov (United States)

    2012-01-01

    Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088

  10. Web Search Engines and Indexing and Ranking the Content Object Including Metadata Elements Available at the Dynamic Information Environments

    Directory of Open Access Journals (Sweden)

    Faezeh sadat Tabatabai Amiri

    2012-10-01

    Full Text Available The purpose of this research was to make exam the indexing and ranking of XML content objects containing Dublin Core and MARC 21 metadata elements in dynamic online information environments by general search engines and comparing them together in a comparative-analytical approach. 100 XML content objects in two groups were analyzed: those with DCXML elements and those with MARCXML elements were published in website http://www.marcdcmi.ir. from late Mordad 1388 till Khordad 1389. Then the website was introduced to Google and Yahoo search engines. Google search engine was able to retrieve fully all the content objects during the study period through their Dublin Core and MARC 21 metadata elements; Yahoo search engine, however, did not respond at all. The indexing of metadata elements embedded in content objects in dynamic online information environments and different between indexing and ranking of them were examined. Findings showed all Dublin Core and MARC 21 metadata elements by Google search engine were indexed. And there was not observed difference between indexing and ranking DCXML and MARCXML metadata elements in dynamic online information environments by Google search engine.

  11. Analyzing web log files of the health on the net HONmedia search engine to define typical image search tasks for image retrieval evaluation.

    Science.gov (United States)

    Müller, Henning; Boyer, Célia; Gaudinat, Arnaud; Hersh, William; Geissbuhler, Antoine

    2007-01-01

    Medical institutions produce ever-increasing amount of diverse information. The digital form makes these data available for the use on more than a single patient. Images are no exception to this. However, less is known about how medical professionals search for visual medical information and how they want to use it outside of the context of a single patient. This article analyzes ten months of usage log files of the Health on the Net (HON) medical media search engine. Key words were extracted from all queries and the most frequent terms and subjects were identified. The dataset required much pre-treatment. Problems included national character sets, spelling errors and the use of terms in several languages. The results show that media search, particularly for images, was frequently used. The most common queries were for general concepts (e.g., heart, lung). To define realistic information needs for the ImageCLEFmed challenge evaluation (Cross Language Evaluation Forum medical image retrieval), we used frequent queries that were still specific enough to at least cover two of the three axes on modality, anatomic region, and pathology. Several research groups evaluated their image retrieval algorithms based on these defined topics.

  12. PMD2HD--a web tool aligning a PubMed search results page with the local German Cancer Research Centre library collection.

    Science.gov (United States)

    Bohne-Lang, Andreas; Lang, Elke; Taube, Anke

    2005-06-27

    Web-based searching is the accepted contemporary mode of retrieving relevant literature, and retrieving as many full text articles as possible is a typical prerequisite for research success. In most cases only a proportion of references will be directly accessible as digital reprints through displayed links. A large number of references, however, have to be verified in library catalogues and, depending on their availability, are accessible as print holdings or by interlibrary loan request. The problem of verifying local print holdings from an initial retrieval set of citations can be solved using Z39.50, an ANSI protocol for interactively querying library information systems. Numerous systems include Z39.50 interfaces and therefore can process Z39.50 interactive requests. However, the programmed query interaction command structure is non-intuitive and inaccessible to the average biomedical researcher. For the typical user, it is necessary to implement the protocol within a tool that hides and handles Z39.50 syntax, presenting a comfortable user interface. PMD2HD is a web tool implementing Z39.50 to provide an appropriately functional and usable interface to integrate into the typical workflow that follows an initial PubMed literature search, providing users with an immediate asset to assist in the most tedious step in literature retrieval, checking for subscription holdings against a local online catalogue. PMD2HD can facilitate literature access considerably with respect to the time and cost of manual comparisons of search results with local catalogue holdings. The example presented in this article is related to the library system and collections of the German Cancer Research Centre. However, the PMD2HD software architecture and use of common Z39.50 protocol commands allow for transfer to a broad range of scientific libraries using Z39.50-compatible library information systems.

  13. Internet Information Search Based Approach to Enriching Textual Descriptions for Public Web Services%基于网络信息搜索的Web Service文本描述信息扩充方法

    Institute of Scientific and Technical Information of China (English)

    王立杰; 李萌; 蔡斯博; 李戈; 谢冰; 杨芙清

    2012-01-01

    随着Web服务技术的不断成熟和发展,互联网上出现了大量的公共Web服务.在使用Web服务开发软件系统的过程中,其文本描述信息(例如简介和使用说明等)可以帮助服务消费者直观有效地识别和理解Web服务并加以利用.已有的研究工作大多关注于从Web服务的WSDL文件中获取此类信息进行Web服务的发现或检索,调研发现,互联网上大部分Web服务的WSDL文件中普遍缺少甚至没有此类信息.为此,提出一种基于网络信息搜索的从WSDL文件之外的信息源为Web服务扩充文本描述信息的方法.从互联网上收集包含目标Web服务特征标识的相关网页,基于从网页中抽取出的信息片段,利用信息检索技术计算信息片段与目标Web服务的相关度,并选取相关度较高的文本片段为Web服务扩充文本描述信息.基于互联网上的真实数据进行的实验,其结果表明,可为约51%的互联网上的Web服务获取到相关网页,并为这些Web服务中约88%扩充文本描述信息.收集到的Web服务及其文本描述信息数据均已公开发布.%With the development of Web services technologies, more and more public Web services have been published on the Internet. During the searching and utilizing of these public services, services' textual descriptions (such as introduction and user manual), which are generally expressed in natural language, provide great help for service consumers to locate, understand, and utilize proper Web services. Existing methods for services discovery usually try to obtain such descriptions only from services' WSDL files. However, according to this investigation, lots of Web services do not contain enough textual descriptions in their WSDL files. This paper proposes an approach to enriching textual descriptions for public Web services on the Internet using the information sources outside of WSDL files. Given a Web service, the study collects related Web pages containing

  14. Web数据挖掘在校园网搜索引擎系统中的应用研究%Applied Research of Web Data Mining in Search Engine System of Campus Network

    Institute of Scientific and Technical Information of China (English)

    牛凯

    2014-01-01

    在阐述了Web数据挖掘的分类、Web数据挖掘的方法和Web数据挖掘具体过程的基础上,设计了校园网搜索引擎系统的整体架构,论述了搜索引擎系统主要功能模块设计,提出了Web数据挖掘技术在校园网搜索引擎系统中的应用。%Based on the elaboration of the classification of Web Data Mining and the Method of Web Data Mining,the specific process is discussed in the paper,And the architecture of campus network search engine system,is designed the main module of Search engine system is discussed,Finally the application of Web data mining technology in campus network search engine.

  15. 深层网技术在专利数据提取中的应用%Using deep Web technology in patent search and download

    Institute of Scientific and Technical Information of China (English)

    袁小龙; 李晓霞; 郭力

    2011-01-01

    深层网技术是获取隐藏在以表单为特征的网络数据库检索入口后的数据页面的提取技术,解决了通用搜索引擎不能有效索引深层网网页的问题.专利数据是一类重要的深层网数据资源,对其进行提取、挖掘具有重要意义.本文利用深层网技术开发了一个专利检索系统,实现了对中国和美国专利数据的本地检索、提取和下载,以及针对中国专利的法律状态检索.该软件支持专利的批量下载及文件管理,并根据中国和美国专利显示为多个单页TIFF格式图片不利于本地管理和浏览的特点,开发了实用性较强的TIFF格式图片多页合并和通用PDF格式转换功能.该专利检索系统采用面向用户的界面设计和功能开发,具有简单、易用的特点.%The Deep Web refers to the web pages which are buried far down on dynamically generated sites and could not be indexed by standard search engines. Online patent data are very important Deep Web data source of valuable technology information. A patent's software using Deep Web technology to search and download the patent data from Chinese and American government websites is presented here. Multi-file download and file management functions are provided. It can also check the legal status of the Chinese patent. To bring convenience to user in saving and browsing patents at local, this software provides TIFF mergence and PDF conversion capabilities which can merge multiple single-page TIFF format images of Patent into single multiple-page TIFF image and then convert it into PDF format.

  16. Síntesis y crítica de las evaluaciones de la efectividad de los motores de búsqueda en la Web. (Synthesis and critical review of evaluations of the effectiveness of Web search engines

    Directory of Open Access Journals (Sweden)

    Francisco Javier Martínez Méndez

    2003-01-01

    Full Text Available A considerable number of proposals for measuring the effectiveness of information retrieval systems have been made since the early days of such systems. The consolidation of the World Wide Web as the paradigmatic method for developing the Information Society, and the continuous multiplication of the number of documents published in this environment, has led to the implementation of the most advanced, and extensive information retrieval systems, in the shape of web search engines. Nevertheless, there is an underlying concern about the effectiveness of these systems, especially when they usually present, in response to a question, many documents with little relevance to the users' information needs. The evaluation of these systems has been, up to now, dispersed and various. The scattering is due to the lack of uniformity in the criteria used in evaluation, and this disparity derives from their a periodicity and variable coverage. In this review, we identify three groups of studies: explicit evaluations, experimental evaluations and, more recently, several proposals for the establishment of a global framework to evaluate these systems.

  17. 网络搜索数据与CPI的相关性研究%A study on correlation between web search data and CPI

    Institute of Scientific and Technical Information of China (English)

    张崇; 吕本富; 彭赓; 刘颖

    2012-01-01

    网络搜索数据蕴含了三亿多市场主体的兴趣与关注,反映其行为趋势与规律,为研究宏观经济问题提供了必要的微观数据基础;本文从商品市场的角度建立概念框架,以均衡价格理论为基础,揭示了网络搜索数据与居民消费价格指数(CPI)之间存在一定的相关关系及先行滞后关系;实证结果表明:网络搜索数据与CPI之间存在协整关系,模型拟合度达到0.978,预测绝对误差为0.48,宏观形势搜索指数和供求关系搜索指数相对于CPI的先行周期分别为五个月和两个月;同时模型具有很强的时效性,比国家统计局的数据发布提前一个月左右;与传统的预测方法相比,模型还具备一定的转折点预测能力.%The web search data, which recorded hundreds of millions of searchers' concerns and interests, reflected the trends of their behaviors and provided an essential data basis for the study of macro-economic issues. This paper established a concept framework based on the commodity market and equilibrium price theory , and revealed there is a certain correlation and lead-lag relationship between web search data and consumer price index ( CPI). Empirical results indicated there is a co-integration relationship between web search data and CPI. The model was able to obtain a good fit with CPI. The model fitting is 0.978 and the absolute forecast error is 0. 48. Simultaneously, the model has very strong time effectiveness compared with traditional CPI monitor method which had a 2-week lag, the model' s forecast result can be obtained one month ahead of the State Statistical Bureau' s report. Compared with the traditional forecasting methods, the model also has the predictive power for a certain turning point.

  18. Evaluating web serch engines

    CERN Document Server

    Lewandowski, Dirk

    2011-01-01

    Every month, more than 130 billion queries worldwide are entered into the search boxes of general-purpose web search engines (ComScore, 2010). This enormous number shows that web searching is not only a large business, but also that many people rely on the search engines' results when researching information. A goal of all search engine evaluation efforts is to generate better systems. This goal is of major importance to the search engine vendors who can directly apply evaluation results to develop better ranking algorithms.

  19. Systematizing Web Search through a Meta-Cognitive, Systems-Based, Information Structuring Model (McSIS)

    Science.gov (United States)

    Abuhamdieh, Ayman H.; Harder, Joseph T.

    2015-01-01

    This paper proposes a meta-cognitive, systems-based, information structuring model (McSIS) to systematize online information search behavior based on literature review of information-seeking models. The General Systems Theory's (GST) prepositions serve as its framework. Factors influencing information-seekers, such as the individual learning…

  20. Dynamic ranking with n + 1 dimensional vector space models: An alternative search mechanism for world wide web

    Digital Repository Service at National Institute of Oceanography (India)

    Lakshminarayana, S.

    ., & Lawrence, S. (2001). The structure of the web, Science, 294, 1849–1850. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (pp. 668–677), New York: ACM-SIAM. Ravikumar, I... are retained when using the duplicate removal algorithm. When the files are ordered using the “set A,B”-ordering the records retained are C A H11005 AH11711B H33371 BH11711A H33371 {a A ,b A ,c A , ...,n A }, i.e., all records from file A (including the overlap...

  1. Research and design of vertical search engine for DCI based on web%基于Web的DCI垂直搜索引擎的研究与设计

    Institute of Scientific and Technical Information of China (English)

    吴洁明; 冀单单; 韩云辉

    2013-01-01

    为了解决用户能够快速、准确的搜索互联网上数字作品信息的问题,分析设计了一个对数字作品版权唯一标识符(Digital Copyright Identifier简称DCI)数字作品的垂直搜索引擎.首先基于Heritrix网络爬虫技术,对互联网上的数字作品进行数据采集和正文信息抽取,并将抽取的数据保存到本地;然后基于Lucene的全文检索工具包,对本地数据进行分词、倒排索引、索引检索和改进的相关度排序等处理,最终设计实现了一个通用可扩展的DCI垂直搜索引擎.实验结果表明,该搜索引擎在很大程度上提高了网页信息抽取的准确度和数据的检索效率.%In order to solve the users' problem for searching digital works information quickly and correctly, a vertical search engine about digital work's Copyright Identifier is analyzed and designed. In the first place, based on the Heritrix web crawler, the network digital work's data acquisition and text information extraction are presented and the extracted data is saved to the local; In the second place, on the basis of the Lucene's full-text retrieval toolkit, segmentation, inverted index, index retrieval and improved sorting algorithm technology are taken to handle the collected data. a general and extensible DCI vertical search engine is designed and achieved. The experimenal results show that this search engine does enhance web page information extraction accuracy and data indexing efficiency in great degree.

  2. A business intelligence approach using web search tools and online data reduction techniques to examine the value of product-enabled services

    DEFF Research Database (Denmark)

    Tanev, Stoyan; Liotta, Giacomo; Kleismantas, Andrius

    2015-01-01

    in Canada and Europe. It adopts an innovative methodology based on online textual data that could be implemented in advanced business intelligence tools aiming at the facilitation of innovation, marketing and business decision making. Combinations of keywords referring to different aspects of service value...... were designed and used in a web search resulting in the frequency of their use on companies’ websites. Principal component analysis was applied to identify distinctive groups of keyword combinations that were interpreted in terms of specific service value attributes. Finally, the firms were classified...... by means of K-means cluster analysis in order to identify the firms with a high degree of articulation of their service value attributes. The results show that the main service value attributes of the Canadian firms are: better service effectiveness, higher market share, higher service quality...

  3. A combined strategy of "in silico" transcriptome analysis and web search engine optimization allows an agile identification of reference genes suitable for normalization in gene expression studies.

    Science.gov (United States)

    Faccioli, Primetta; Ciceri, Gian Paolo; Provero, Paolo; Stanca, Antonio Michele; Morcia, Caterina; Terzi, Valeria

    2007-03-01

    Traditionally housekeeping genes have been employed as endogenous reference (internal control) genes for normalization in gene expression studies. Since the utilization of single housekeepers cannot assure an unbiased result, new normalization methods involving multiple housekeeping genes and normalizing using their mean expression have been recently proposed. Moreover, since a gold standard gene suitable for every experimental condition does not exist, it is also necessary to validate the expression stability of every putative control gene on the specific requirements of the planned experiment. As a consequence, finding a good set of reference genes is for sure a non-trivial problem requiring quite a lot of lab-based experimental testing. In this work we identified novel candidate barley reference genes suitable for normalization in gene expression studies. An advanced web search approach aimed to collect, from publicly available web resources, the most interesting information regarding the expression profiling of candidate housekeepers on a specific experimental basis has been set up and applied, as an example, on stress conditions. A complementary lab-based analysis has been carried out to verify the expression profile of the selected genes in different tissues and during heat shock response. This combined dry/wet approach can be applied to any species and physiological condition of interest and can be considered very helpful to identify putative reference genes to be shortlisted every time a new experimental design has to be set up.

  4. Psychophysics in a Web browser? Comparing response times collected with JavaScript and Psychophysics Toolbox in a visual search task.

    Science.gov (United States)

    de Leeuw, Joshua R; Motz, Benjamin A

    2016-03-01

    Behavioral researchers are increasingly using Web-based software such as JavaScript to conduct response time experiments. Although there has been some research on the accuracy and reliability of response time measurements collected using JavaScript, it remains unclear how well this method performs relative to standard laboratory software in psychologically relevant experimental manipulations. Here we present results from a visual search experiment in which we measured response time distributions with both Psychophysics Toolbox (PTB) and JavaScript. We developed a methodology that allowed us to simultaneously run the visual search experiment with both systems, interleaving trials between two independent computers, thus minimizing the effects of factors other than the experimental software. The response times measured by JavaScript were approximately 25 ms longer than those measured by PTB. However, we found no reliable difference in the variability of the distributions related to the software, and both software packages were equally sensitive to changes in the response times as a result of the experimental manipulations. We concluded that JavaScript is a suitable tool for measuring response times in behavioral research.

  5. Surging Seas Risk Finder: A Simple Search-Based Web Tool for Local Sea Level Rise Projections, Coastal Flood Risk Forecasts, and Inundation Exposure Analysis

    Science.gov (United States)

    Strauss, B.; Dodson, D.; Kulp, S. A.; Rizza, D. H.

    2016-12-01

    Surging Seas Risk Finder (riskfinder.org) is an online tool for accessing extensive local projections and analysis of sea level rise; coastal floods; and land, populations, contamination sources, and infrastructure and other assets that may be exposed to inundation. Risk Finder was first published in 2013 for Florida, New York and New Jersey, expanding to all states in the contiguous U.S. by 2016, when a major new version of the tool was released with a completely new interface. The revised tool was informed by hundreds of survey responses from and conversations with planners, local officials and other coastal stakeholders, plus consideration of modern best practices for responsive web design and user interfaces, and social science-based principles for science communication. Overarching design principles include simplicity and ease of navigation, leading to a landing page with Google-like sparsity and focus on search, and to an architecture based on search, so that each coastal zip code, city, county, state or other place type has its own webpage gathering all relevant analysis in modular, scrollable units. Millions of users have visited the Surging Seas suite of tools to date, and downloaded thousands of files, for stated purposes ranging from planning to business to education to personal decisions; and from institutions ranging from local to federal government agencies, to businesses, to NGOs, and to academia.

  6. Visual search for tropical web spiders: the influence of plot length, sampling effort, and phase of the day on species richness.

    Science.gov (United States)

    Pinto-Leite, C M; Rocha, P L B

    2012-12-01

    Empirical studies using visual search methods to investigate spider communities were conducted with different sampling protocols, including a variety of plot sizes, sampling efforts, and diurnal periods for sampling. We sampled 11 plots ranging in size from 5 by 10 m to 5 by 60 m. In each plot, we computed the total number of species detected every 10 min during 1 hr during the daytime and during the nighttime (0630 hours to 1100 hours, both a.m. and p.m.). We measured the influence of time effort on the measurement of species richness by comparing the curves produced by sample-based rarefaction and species richness estimation (first-order jackknife). We used a general linear model with repeated measures to assess whether the phase of the day during which sampling occurred and the differences in the plot lengths influenced the number of species observed and the number of species estimated. To measure the differences in species composition between the phases of the day, we used a multiresponse permutation procedure and a graphical representation based on nonmetric multidimensional scaling. After 50 min of sampling, we noted a decreased rate of species accumulation and a tendency of the estimated richness curves to reach an asymptote. We did not detect an effect of plot size on the number of species sampled. However, differences in observed species richness and species composition were found between phases of the day. Based on these results, we propose guidelines for visual search for tropical web spiders.

  7. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...

  8. Fuzzification of Web Objects: A Semantic Web Mining Approach

    Directory of Open Access Journals (Sweden)

    Tasawar Hussain

    2012-03-01

    Full Text Available Web Mining is becoming essential to support the web administrators and web users in multi-ways such as information retrieval; website performance management; web personalization; web marketing and website designing. Due to uncontrolled exponential growth in web data, knowledge base retrieval has become a very challenging task. The one viable solution to the problem is the merging of conventional web mining with semantic web technologies. This merging process will be more beneficial to web users by reducing the search space and by providing information that is more relevant. Key web objects play significant role in this process. The extraction of key web objects from a website is a challenging task. In this paper, we have proposed a framework, which extracts the key web objects from web log file and apply a semantic web to mine actionable intelligence. This proposed framework can be applied to non-semantic web for the extraction of key web objects. We also have defined an objective function to calculate key web object from users perspective. We named this function as key web object function. KWO function helps to fuzzify the extracted key web objects into three categories as Most Interested, Interested, and Least Interested. Fuzzification of web objects helps us to accommodate the uncertainty among the web objects of being user attractive. We also have validated the proposed scheme with the help of a case study.

  9. Exploring the academic invisible web

    OpenAIRE

    Lewandowski, Dirk; Mayr, Philipp

    2006-01-01

    Purpose: To provide a critical review of Bergman’s 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scientific search engines. We provide an overview of approaches followed thus far. Design/methodol...

  10. Hidden Page WebCrawler Model for Secure Web Pages

    Directory of Open Access Journals (Sweden)

    K. F. Bharati

    2013-03-01

    Full Text Available The traditional search engines available over the internet are dynamic in searching the relevant content over the web. The search engine has got some constraints like getting the data asked from a varied source, where the data relevancy is exceptional. The web crawlers are designed only to more towards a specific path of the web and are restricted in moving towards a different path as they are secured or at times restricted due to the apprehension of threats. It is possible to design a web crawler that will have the capability of penetrating through the paths of the web, not reachable by the traditional web crawlers, in order to get a better solution in terms of data, time and relevancy for the given search query. The paper makes use of a newer parser and indexer for coming out with a novel idea of web crawler and a framework to support it. The proposed web crawler is designed to attend Hyper Text Transfer Protocol Secure (HTTPS based websites and web pages that needs authentication to view and index. User has to fill a search form and his/her creditionals will be used by the web crawler to attend secure web server for authentication. Once it is indexed the secure web server will be inside the web crawler’s accessible zone

  11. Web Community Search Engine System%一个Web社区搜索引擎系统

    Institute of Scientific and Technical Information of China (English)

    刘务华; 罗铁坚; 王文杰

    2007-01-01

    在分析Web社区搜索资源分散特点的基础上,运用Web抓取器、向量空间模型和相关性排序等技术设计了Web社区搜索引擎的体系结构,实现了一个Web社区搜索引擎系统--ChinalabSearch.根据对系统的性能评估,系统满足Web社区的搜索要求,提高了在社区内查找信息的效率,为组织间的合作提供了方便.

  12. CHANNELING: SEARCH OF THE METAANTHROPOLOGIC MEAS-UREMENTS OF MIND BEING IN THE UNIVERSE (based on the World Wide Web

    Directory of Open Access Journals (Sweden)

    Anatoly T. Tshedrin

    2014-06-01

    Full Text Available The relevance of the study. In the context of religious and philosophical movements of the «New Age» gained channeling phenomenon – «laying channel», «transmission channel» information from the consciousness that is not in human form, to the individual and humanity as a whole. In the socio-cultural environment of the postmodern channeling reflects the problem of finding extraterrestrial intelligence (ETI; «ETC-problem»; SETІ problem and to establish contacts with them, this problem has a different projection, important philosophical and anthropological measurements in culture. Investigation of mechanisms of constructing virtual superhuman personalities in the world web is not only of interest for further analysis of the problem of extraterrestrial intelligence (ETI, but also to extend subject field of anthropology of the Internet as an important area of philosophical and anthropological studies. The purpose of the study. Analysis of the phenomenon of channeling as a projection of the fundamental problems of life ETI, its representation on the World Wide Web, the impact on the archaism of postmodern culture posing problems meta an-thropological dimensions of existence in the universe of reason and contact with him in the doctrinal grounds channeling. Analysis of research on the problem and its empirical base. Clustered nature of the problem of ETI and channeling its element involves the widespread use of radio astronomy paradigm works carriers solve CETI; work in anthropology Internet; works of researchers of the phenomenon of «New Age». Empirical basis of the study are network resources, as well as texts–representatives created and introduced into circulation by the channelers, their predecessors. Research Methodology. Channeling as an object of research, its network of representation – as a matter of methods involve the use of analytical hermeneutics and archaeographic commenting text fractal logic cluster analysis. The main

  13. Web搜索结果多层聚类方法研究%Research on Multi-level Clustering for Web Search Results

    Institute of Scientific and Technical Information of China (English)

    庞观松; 蒋盛益; 张黎莎; 区雄发; 赖旭明

    2011-01-01

    In order to facilitate the browse of the search results produced by search engines, this paper proposed a TFIDF-based new mechod to calculate the similarity of the documents and Web search results multi-level clustering by using one-pass clustering algorithm with linear time complexity. At the same time, we proposed a strategy to extract cluster keyword from multi-texts: selected noun or noun phrase as candidate cluster keywords, and took term frequency, the position of term occurring, the length of term and text into consideration to set a weighting function to compute every words weights of the search results, then automatically extracted the weightiest candidate keyword for each cluster generated by multi-level clustering without the intervene of human and the assistance of corpus. Experimental results on Baidu, ODP corpus and user investigation show the efficient and acceptance of our algorithm.%为了便于用户浏览搜索引擎返回结果,本文提出了一种基于TFIDF新的文本相似度计算方法,并提出使用具有近似线性时间复杂度的增量聚类算法对文本进行多层聚类的策略.同时,提出了一种从多文本中提取关键词的策略:提取簇中的名词或名词短语作为候选关键词,综合考虑每个候选关键词的词频、出现位置、长度和文本长度设置加权函数来计算其权重,不需要人工干预以及语料库的协助,自动提取权重最大的候选关键词作为类别关键词.在收集的百度、ODP语料以及公开测试的实验结果表明本文提出方法的有效性.

  14. Penerapan teknik web scraping pada mesin pencari artikel ilmiah

    OpenAIRE

    Josi, Ahmad; Abdillah, Leon Andretti; Suryayusra

    2014-01-01

    Search engines are a combination of hardware and computer software supplied by a particular company through the website which has been determined. Search engines collect information from the web through bots or web crawlers that crawls the web periodically. The process of retrieval of information from existing websites is called "web scraping." Web scraping is a technique of extracting information from websites. Web scraping is closely related to Web indexing, as for how to develop a web scra...

  15. Excavando la web

    OpenAIRE

    Ricardo, Baeza-Yates

    2004-01-01

    The web is the internet's most important phenomenon, as demonstrated by its exponential growth and diversity. Hence, due to the volume and wealth of its data, search engines have become among the web's main tools. They are useful when we know what we are looking for. However, certainly the web holds answers to questions never imagined. The process of finding relations or interesting patterns within a data set is called "data mining" and in the case of the web, "web mining". In this article...

  16. Infodemiological data of high-school drop-out related web searches in Canada correlating with real-world statistical data in the period 2004-2012.

    Science.gov (United States)

    Siri, Anna; Khabbache, Hicham; Al-Jafar, Ali; Martini, Mariano; Brigo, Francesco; Bragazzi, Nicola Luigi

    2016-12-01

    The present data article describes high-school drop-out related web activities in Canada, from 2004 to 2012, obtained mining Google Trends (GT), using high-school drop-out as key-word. The searches volumes were processed, correlated and cross-correlated with statistical data obtained at national and province level and broken down for gender. Further, an autoregressive moving-average (ARMA) model was used to model the GT-generated data. From a qualitative point of view, GT-generated relative search volumes (RSVs) reflect the decrease in drop-out rate. The peak in the Internet-related activities occurs in 2004 (56.35%, normalized value), and gradually declines to 40.59% (normalized value) in 2007. After, it remains substantially stable until 2012 (40.32%, normalized value). From a quantitative standpoint, the correlations between Canadian high-school drop-out rate and GT-generated RSVs in the study period (2004-2012) were statistically significant both using the drop-out rate for academic year and the 3-years moving average. Examining the data broken down by gender, the correlations were higher and statistically significant in males than in females. GT-based data for drop-out resulted best modeled by an ARMA(1,0) model. Considering the cross correlation of Canadian regions, all of them resulted statistically significant at lag 0, apart from for New Brunswick, Newfoundland and Labrador and the Prince Edward island. A number or cross-correlations resulted statistically significant also at lag -1 (namely, Alberta, Manitoba, New Brunswick and Saskatchewan).

  17. Search 3.0: Present, Personal, Precise

    Science.gov (United States)

    Spivack, Nova

    The next generation of Web search is already beginning to emerge. With it we will see several shifts in the way people search, and the way major search engines provide search functionality to consumers.

  18. APLIKASI WEB CRAWLER UNTUK WEB CONTENT PADA MOBILE PHONE

    Directory of Open Access Journals (Sweden)

    Sarwosri Sarwosri

    2009-01-01

    Full Text Available Crawling is the process behind a search engine, which served through the World Wide Web in a structured and with certain ethics. Applications that run the crawling process is called Web Crawler, also called web spider or web robot. The growth of mobile search services provider, followed by growth of a web crawler that can browse web pages in mobile content type. Crawler Web applications can be accessed by mobile devices and only web pages that type Mobile Content to be explored is the Web Crawler. Web Crawler duty is to collect a number of Mobile Content. A mobile application functions as a search application that will use the results from the Web Crawler. Crawler Web server consists of the Servlet, Mobile Content Filter and datastore. Servlet is a gateway connection between the client with the server. Datastore is the storage media crawling results. Mobile Content Filter selects a web page, only the appropriate web pages for mobile devices or with mobile content that will be forwarded.

  19. On Building a Search Interface Discovery System

    Science.gov (United States)

    Shestakov, Denis

    A huge portion of the Web known as the deep Web is accessible via search interfaces to myriads of databases on the Web. While relatively good approaches for querying the contents of web databases have been recently proposed, one cannot fully utilize them having most search interfaces unlocated. Thus, the automatic recognition of search interfaces to online databases is crucial for any application accessing the deep Web. This paper describes the architecture of the I-Crawler, a system for finding and classifying search interfaces. The I-Crawler is intentionally designed to be used in the deep web characterization surveys and for constructing directories of deep web resources.

  20. 一种对语义网上本体查询和检索的新方法%Novel method for searching ontologies on semantic web

    Institute of Scientific and Technical Information of China (English)

    虞为; 曹加恒; 陈俊鹏

    2006-01-01

    In order to solve the problem of information retrieval on the semantic web,a new semantic information retrieval (SIR) model for searching ontologies on the semantic web is proposed.First,SIR transformed domain ontologies into global ontologies.Then semantic index terms were extracted from these global ontologies.Based on semantic index terms, logical inferences can be performed and the logical views of the concept can be obtained.These logical views represent the expanded meaning of the concept.Using logical views,SIR can perform the information retrieval and inferences based on the semantic relationships in the documents,not only on the syntactic analysis of the documents.SIR can significantly enhance the recall and precision of the information retrieval by the semantic inference.Finally,the practicability of the SIR model is analyzed.%针对语义网信息检索中存在的问题,提出了一个基于语义索引词的语义网信息检索模型SIR(semantic information retrieval).其核心思想是将领域本体转换成全局本体,并从全局本体中提取语义索引词.通过语义索引词进行语义推理,可得概念的逻辑视图.SIR通过语义索引词间的语义关系对网络资源进行检索,解决了在传统的基于关键字的信息检索中只能从句法上对关键字进行分析,无法根据信息资源中的语义关系进行检索的问题.最后分析了SIR的可用性,证明了SIR可极大地提高语义网上信息检索的查全率和查准率.

  1. Search Engine Optimization

    CERN Document Server

    Davis, Harold

    2006-01-01

    SEO--short for Search Engine Optimization--is the art, craft, and science of driving web traffic to web sites. Web traffic is food, drink, and oxygen--in short, life itself--to any web-based business. Whether your web site depends on broad, general traffic, or high-quality, targeted traffic, this PDF has the tools and information you need to draw more traffic to your site. You'll learn how to effectively use PageRank (and Google itself); how to get listed, get links, and get syndicated; and much more. The field of SEO is expanding into all the possible ways of promoting web traffic. This

  2. Evaluation Method of Web Site Based on Web Structure Mining

    Institute of Scientific and Technical Information of China (English)

    LiJun-e; ZhouDong-ru

    2003-01-01

    The structure of Web site became more complex than before. During the design period of a Web site, the lack of model and method results in improper Web structure,which depend on the designer's experience. From the point of view of software engineering, every period in the software life must be evaluated before starting the next period's work. It is very important and essential to search relevant methods for evaluating Web structure before the site is completed. In this work, after studying the related work about the Web structure mining and analyzing the major structure mining methods (Page-rank and Hub/Authority), a method based on the Page-rank for Web structure evaluation in design stage is proposed. A Web structure modeling language WSML is designed, and the implement strategies for evaluating system of the Web site structure are given out. Web structure mining has being used mainly in search engines before. It is the first time to employ the Web structure mining technology to evaluate a Web structure in the design period of a Web site. It contributes to the formalization of the design documents for Web site and the improving of software engineering for large scale Web site, and the evaluating system is a practical tool for Web site construction.

  3. Digging Deeper: The Deep Web.

    Science.gov (United States)

    Turner, Laura

    2001-01-01

    Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…

  4. Digging Deeper: The Deep Web.

    Science.gov (United States)

    Turner, Laura

    2001-01-01

    Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…

  5. Choosing meaningful structure data for improving web search%用于改善web搜索的结构化数据抽取技术

    Institute of Scientific and Technical Information of China (English)

    郭茜; 杨晓春; 于戈; 李广翱

    2008-01-01

    为了提高web文本搜索质量,提出了基于语义结构化数据的查询扩展方法.通过分析属性的语义特征(文档频率特征和辨识能力特征)将属性分为概念属性、背景属性和无用属性3类,并且提出了衡量属性语义相关度的标准.设计了trie-bitmap和pair pointer table数据结构来实现发掘属性语义特征和检测属性语义相关度的有效算法.通过使用合适的属性和它们的语义关系,可以为查询关键字生成扩展词并将它们嵌入到具有插值参数的向量空间模型中.实验使用IMDB电影数据库和真实文本数据集来比较所提方法和原始向量空间模型的性能.实验结果证明所提出的查询扩展方法可以有效地提高文本搜索性能,同时属性语义特征和属性语义相关度都具有良好的分类能力.%In order to improve the quality of web search, a new query expansion method by choosing meaningful structure data from a domain database is proposed. It categories attributes into three different classes, named as concept attribute, context attribute and meaningless attribute, according to their semantic features which are document frequency features and distinguishing capability features. It also defines the semantic relevance between two attributes when they have correlations in the database. Then it proposes trie-bitmap structure and pair pointer tables to implement efficient algorithms for discovering attribute semantic feature and detecting their semantic relevances.By using semantic attributes and their semantic relevances, expansion words can be generated and embedded into a vector space model with interpolation parameters. The experiments use an IMDB movie database and real texts collections to evaluate the proposed method by comparing its performance with a classical vector space model. The results show that the proposed method can improve text search efficiently and also improve both semantic features and semantic

  6. Design and Testing of BACRA, a Web-Based Tool for Middle Managers at Health Care Facilities to Lead the Search for Solutions to Patient Safety Incidents.

    Science.gov (United States)

    Carrillo, Irene; Mira, José Joaquín; Vicente, Maria Asuncion; Fernandez, Cesar; Guilabert, Mercedes; Ferrús, Lena; Zavala, Elena; Silvestre, Carmen; Pérez-Pérez, Pastora

    2016-09-27

    Lack of time, lack of familiarity with root cause analysis, or suspicion that the reporting may result in negative consequences hinder involvement in the analysis of safety incidents and the search for preventive actions that can improve patient safety. The aim was develop a tool that enables hospitals and primary care professionals to immediately analyze the causes of incidents and to propose and implement measures intended to prevent their recurrence. The design of the Web-based tool (BACRA) considered research on the barriers for reporting, review of incident analysis tools, and the experience of eight managers from the field of patient safety. BACRA's design was improved in successive versions (BACRA v1.1 and BACRA v1.2) based on feedback from 86 middle managers. BACRA v1.1 was used by 13 frontline professionals to analyze incidents of safety; 59 professionals used BACRA v1.2 and assessed the respective usefulness and ease of use of both versions. BACRA contains seven tabs that guide the user through the process of analyzing a safety incident and proposing preventive actions for similar future incidents. BACRA does not identify the person completing each analysis since the password introduced to hide said analysis only is linked to the information concerning the incident and not to any personal data. The tool was used by 72 professionals from hospitals and primary care centers. BACRA v1.2 was assessed more favorably than BACRA v1.1, both in terms of its usefulness (z=2.2, P=.03) and its ease of use (z=3.0, P=.003). BACRA helps to analyze incidents of safety and to propose preventive actions. BACRA guarantees anonymity of the analysis and reduces the reluctance of professionals to carry out this task. BACRA is useful and easy to use.

  7. Yahoo! Cataloging the Web.

    Science.gov (United States)

    Callery, Anne

    The Internet has the potential to be the ultimate information resource, but it needs to be organized in order to be useful. This paper discusses how the subject guide, "Yahoo!" is different from most web search engines, and how best to search for information on Yahoo! The strength in Yahoo! lies in the subject hierarchy. Advantages to…

  8. An Efficient Cluster Based Web Object Filters From Web Pre-Fetching And Web Caching On Web User Navigation

    Directory of Open Access Journals (Sweden)

    A. K. Santra

    2012-05-01

    Full Text Available The World Wide Web is a distributed internet system, which provides dynamic and interactive services includes on line tutoring, video/audio conferencing, e-commerce, and etc., which generated heavy demand on network resources and web servers. It increase over the past few year at a very rapidly rate, due to which the amount of traffic over the internet is increasing. As a result, the network performance has now become very slow. Web Pre-fetching and Caching is one of the effective solutions to reduce the web access latency and improve the quality of service. The existing model presented a Cluster based pre-fetching scheme identified clusters of correlated Web pages based on users access patterns. Web Pre-fetching and Caching cause significant improvements on the performance of Web infrastructure. In this paper, we present an efficient Cluster based Web Object Filters from Web Pre-fetching and Web caching scheme to evaluate the web user navigation patterns and user references of product search. Clustering of web page objects obtained from pre-fetched and web cached contents. User Navigation is evaluated from the web cluster objects with similarity retrieval in subsequent user sessions. Web Object Filters are built with the interpretation of the cluster web pages related to the unique users by discarding redundant pages. Ranking is done on users web page product preferences at multiple sessions of each individual user. The performance is measured in terms of Objective function, Number of clusters and cluster accuracy.

  9. Evaluative Measures of Search Engines

    OpenAIRE

    Jitendra Nath Singh; Dr. S.K. Dwivedi

    2012-01-01

    The ability to search and retrieve information from the web efficiently and effectively is great challenge of search engine. Information retrieval on the Web is very different from retrieval in traditional indexed databases because it’s hyper-linked character, the heterogeneity of document types and authoring styles. Thus, since Web retrieval is substantially different from information retrieval, new or revised evaluative measures are required to assess retrieval performance using search engi...

  10. SearchResultFinder: federated search made easy

    OpenAIRE

    Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd

    2013-01-01

    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up determining reusable XPaths for extracting search result items from HTML search result pages. Based on a single search result page, the tool presents a ranked list of candidate extraction XPaths and al...

  11. Indexing and Retrieval for the Web.

    Science.gov (United States)

    Rasmussen, Edie M.

    2003-01-01

    Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…

  12. An Efficient Web Page Ranking for Semantic Web

    Science.gov (United States)

    Chahal, P.; Singh, M.; Kumar, S.

    2014-01-01

    With the enormous amount of information presented on the web, the retrieval of relevant information has become a serious problem and is also the topic of research for last few years. The most common tools to retrieve information from web are search engines like Google. The Search engines are usually based on keyword searching and indexing of web pages. This approach is not very efficient as the result-set of web pages obtained include large irrelevant pages. Sometimes even the entire result-set may contain lot of irrelevant pages for the user. The next generation of search engines must address this problem. Recently, many semantic web search engines have been developed like Ontolook, Swoogle, which help in searching meaningful documents presented on semantic web. In this process the ranking of the retrieved web pages is very crucial. Some attempts have been made in ranking of semantic web pages but still the ranking of these semantic web documents is neither satisfactory and nor up to the user's expectations. In this paper we have proposed a semantic web based document ranking scheme that relies not only on the keywords but also on the conceptual instances present between the keywords. As a result only the relevant page will be on the top of the result-set of searched web pages. We explore all relevant relations between the keywords exploring the user's intention and then calculate the fraction of these relations on each web page to determine their relevance. We have found that this ranking technique gives better results than those by the prevailing methods.

  13. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  14. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  15. Library Catalogue Users Are Influenced by Trends in Web Searching Search Strategies. A review of: Novotny, Eric. “I Don’t Think I Click: A Protocol Analysis Study of Use of a Library Online Catalog in the Internet Age.” College & Research Libraries, 65.6 (Nov. 2004: 525-37.

    Directory of Open Access Journals (Sweden)

    Susan Haigh

    2006-09-01

    Full Text Available Objective – To explore how Web-savvy users think about and search an online catalogue. Design – Protocol analysis study. Setting – Academic library (Pennsylvania State University Libraries. Subjects – Eighteen users (17 students, 1 faculty member of an online public access catalog, divided into two groups of nine first-time and nine experienced users. Method – The study team developed five tasks that represented a range of activities commonly performed by library users, such as searching for a specific item, identifying a library location, and requesting a copy. Seventeen students and one faculty member, divided evenly between novice and experienced searchers, were recruited to “think aloud” through the performance of the tasks. Data were gathered through audio recordings, screen capture software, and investigator notes. The time taken for each task was recorded, and investigators rated task completion as “successful,” “partially successful,” “fail,” or “search aborted.” After the searching session, participants were interviewed to clarify their actions and provide further commentary on the catalogue search. Main results – Participants in both test groups were relatively unsophisticated subject searchers. They made minimal use of Boolean operators, and tended not to repair failed searches by rethinking the search vocabulary and using synonyms. Participants did not have a strong understanding of library catalogue contents or structure and showed little curiosity in developing an understanding of how to utilize the catalogue. Novice users were impatient both in choosing search options and in evaluating their search results. They assumed search results were sorted by relevance, and thus would not typically browse past the initial screen. They quickly followed links, fearlessly tried different searches and options, and rapidly abandoned false trails. Experienced users were more effective and efficient searchers than

  16. Millennial Generation Students Search the Web Erratically, with Minimal Evaluation of Information Quality. A Review of: Taylor, A. (2012. A study of the information search behaviour of the millennial generation. Information Research, 17(1, paper 508. Retrieved from http://informationr.net/ir/17-1/paper508.html

    Directory of Open Access Journals (Sweden)

    Dominique Daniel

    2013-03-01

    Full Text Available Objective – To identify how millennial generation students proceed through the information search process and select resources on the web; to determine whether students evaluate the quality of web resources and how they use general information websites.Design – Longitudinal study.Setting – University in the United States.Subjects – 80 undergraduate students of the millennial generation enrolled in a business course.Methods – The students were required to complete a research report with a bibliography in five weeks. They also had to turn in interim assignments during that period (including an abstract, an outline, and rough draft. Their search behaviour was monitored using a modified Yahoo search engine that allowed subjects to search, and then to fill out surveys integrated directly below their search results. The students were asked to indicate the relevance of the resources they found on the open web, to identify the criteria they used toevaluate relevance, and to specify the stage they were at in the search process. They could choose from five stages defined by the author, based on Wilson (1999: initiation, exploration, differentiation, extracting, and verifying. Datawere collected using anonymous user IDs and included URLs for sources selected along with subject answers until completion of all assignments. The students provided 758 distinct web page evaluations.Main Results – Students did not progress in orderly fashion through the search process, but rather proceeded erratically. A substantial number reported being in fewer than four of the five search stages. Only a small percentage ever declared being in the final stage of verifying previously gathered information, and during preparation of the final report a majority still declared being in the extracting stage. In fact, participants selected documents (extracting stage throughout the process. In addition, students were not much concerned with the quality, validity, or

  17. Extracting Macroscopic Information from Web Links.

    Science.gov (United States)

    Thelwall, Mike

    2001-01-01

    Discussion of Web-based link analysis focuses on an evaluation of Ingversen's proposed external Web Impact Factor for the original use of the Web, namely the interlinking of academic research. Studies relationships between academic hyperlinks and research activities for British universities and discusses the use of search engines for Web link…

  18. Search engines that learn from their users

    NARCIS (Netherlands)

    Schuth, A.G.

    2016-01-01

    More than half the world’s population uses web search engines, resulting in over half a billion search queries every single day. For many people web search engines are among the first resources they go to when a question arises. Moreover, search engines have for many become the most trusted route to

  19. Harvesting and Organizing Knowledge from the Web

    OpenAIRE

    Weikum, Gerhard

    2007-01-01

    Information organization and search on the {W}eb is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and {D}eep-{W}eb search. I envision another big leap forward by automatically harvesting and organizing knowledge from the {W}eb, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three stron...

  20. Establishment of Satisfaction Model and Evaluation Criteria System For Web Search Engine%Web搜索引擎满意度模型与评价指标体系构建

    Institute of Scientific and Technical Information of China (English)

    叶凤云; 汪传雷

    2011-01-01

    Based on the ACSI, this paper not only establishes Web search engine satisfaction model, but also establishes evaluation criteria system with the past search engine evaluation criteria system and the upper model. It can provide references for evaluating Web search engine satisfaction and creating the basement to verify the model fully.%在ACSI(美国客户满意度指数)模型基础上,构建Web搜索引擎满意度(简称WSES)模型。同时,依据已有的搜索引擎评价指标体系,结合所构建的WSES模型,建立相应的测量指标体系,为进一步进行结构方程模型的验证分析建立基础,并为评价Web搜索引擎满意度提供参考。

  1. Citation Analysis using the Medline Database at the Web of Knowledge: Searching "Times Cited" with Medical Subject Headings (MeSH)

    CERN Document Server

    Leydesdorff, Loet

    2012-01-01

    Citation analysis of documents retrieved from the Medline database (at the Web of Knowledge) has been possible only on a case-by-case basis. A technique is here developed for citation analysis in batch mode using both Medical Subject Headings (MeSH) at the Web of Knowledge and the Science Citation Index at the Web of Science. This freeware routine is applied to the case of "Brugada Syndrome," a specific disease and field of research (since 1992). The journals containing these publications are attributed to Web-of-Science Categories other than "Cardiac and Cardiovascular Systems"), perhaps because of the possibility of genetic testing for this syndrome in the clinic. With this routine, all the instruments available for citation analysis can be used on the basis of MeSH terms.

  2. Web页面视觉搜索与浏览策略的眼动研究%Exploring Visual Search and Browsing Strategies on Web Pages Using the Eye-tracking

    Institute of Scientific and Technical Information of China (English)

    栗觅; 钟宁; 吕胜富

    2011-01-01

    利用眼动跟踪技术,探讨Web页面视觉搜索和浏览的视觉特征,并分析2种视觉行为对应策略的差异.实验结果发现,视觉搜索时,周边区域的注视时间和注视次数显著大于中心区域;而视觉浏览时,周边区域和中心区域没有显著差异.而且,视觉搜索时的瞳孔直径显著大于浏览时的瞳孔直径,说明视觉搜索时的心理负荷显著大于浏览时的心理负荷.结果表明,在Web页面上视觉搜索呈现周边区域的视觉搜索策略,而对于视觉浏览更多采用无明显规律的自由随机浏览策略.这种视觉搜索与浏览策略的差异主要是来自目标驱动和心理负荷大小的影响.%This study investigates the characteristics of visual search and browsing, and analyzes the differences of those strategies on Web pages using the eye tracking. When participants search on Web pages, the peripheral area is significantly higher than the central area on fixation duration and fixation count; when participants brows information on Web pages, there is no significant difference between the peripheral area and the central area on fixation duration and fixation count. Visual search is significantly larger than browsing on the average pupil diameter, which shows that the mental load on visual search is significantly greater than that on browsing.Results show that the visual search strategy pays more attention to the peripheral area than the central area;however, the visual browsing strategy freely run their eyes over the peripheral area and the central area in equality. The differences between the visual search and the browsing strategies are mainly due to the goal-driven and the mental load.

  3. Study on Web usage mining in search engine of university library%高校图书馆搜索引擎中Web使用记录挖掘研究

    Institute of Scientific and Technical Information of China (English)

    赵静

    2013-01-01

    针对高校的信息资源检索的命中率低问题提出了运用Web使用记录挖掘的高校图书馆搜索引擎.通过采用Web使用记录挖掘技术和Clementine对高校图书馆网站的Web访问日志记录进行挖掘.在Web使用记录挖掘流程中,提出一个基于用户IP、登陆时间、网站的拓扑图、引用网页和Agent采识别出单个用户的新算法,获得有效提高识别用户的实验结果.最后用路径分析来挖掘模式,优化网站结构,从而提高高校图书馆搜索引擎的命中率.%Because the hit rate of university information resource retrieval is low, the university library search engine applying Web usage mining is put forward. Through Web usage mining technology and Clementine, Web access log record of uni-versity library website was excavated. In the process of Web usage mining, a new algorithm that identifies individual users is pro-posed based on user IP, log time, site topological graph, cited webpage and Agent, so as to improve the effect of user recognition. The path analysis is used to excavate the pattern and optimize the website structure, so that the hit rate of university library search engine is raised up.

  4. Rendimiento de los sistemas de recuperación de información en la web: evalución de servicios de búsqueda (search engines.

    Directory of Open Access Journals (Sweden)

    Olvera Lobo, María Dolores

    2000-09-01

    Full Text Available Ten search engines, Altavista, Excite, Hotbot, Infoseek, Lycos. Magellan, OpenText, WebCrawler, WWWWorm, Yahoo, were evaluated, by means of a questionnaire with 20 items (adding up to a total of 200 questions. The 20 first results for each question were analysed in terms of relevance, and values of precision and recall were computed for the resulting 4000 references. The results are also analyzed in terms of the type of question (boolean or natural language and topic (specialized vs. general interest. The results showed that Excite, Infoseek and AltaVista performed generally better. The conclusion of this methodological trial was that the method used allows the evaluation of the performance of Information Retrieval Systems in the Web. As for the results, web search engines are not very precise but extremely exhaustive.

    Se han evaluado diez servicios de búsqueda: Altavista, Excite, Hotbot, Infoseek, Lycos, Magellan, OpenText, WebCrawler, WWWWorm, Yahoo. Se formularon 20 preguntas a cada uno de los 10 sistemas evaluados por lo que se realizaron 200 consultas. Además, se examinó la relevancia de los primeros 20 resultados de cada consulta lo que significa que, en total, se revisaron aproximadamente 4.000 referencias, para cada una de las cuales se calcularon los valores de precisión y exhaustividad. Los análisis muestran que Excite, Infoseek y Altavista son los tres servicios que, de forma genérica, muestran mejor rendimiento. Se analizan también los resultados en función del tipo de pregunta (booleanas o de frase y del tema (ocio o especializada. Se concluye que el método empleado permite analizar el rendimiento de los SRI de la W3 y que los resultados ponen de manifiesto que los buscadores no son sistemas de recuperación de información muy precisos aunque sí muy exhaustivos.

  5. Searching for American Indian Resources on the Internet.

    Science.gov (United States)

    Pollack, Ira; Derby, Amy

    This paper provides basic information on searching the Internet and lists World Wide Web sites containing resources for American Indian education. Comprehensive and topical Web directories, search engines, and meta-search engines are briefly described. Search strategies are discussed, and seven Web sites are listed that provide more advanced…

  6. Searching Databases with Keywords

    Institute of Scientific and Technical Information of China (English)

    Shan Wang; Kun-Long Zhang

    2005-01-01

    Traditionally, SQL query language is used to search the data in databases. However, it is inappropriate for end-users, since it is complex and hard to learn. It is the need of end-user, searching in databases with keywords, like in web search engines. This paper presents a survey of work on keyword search in databases. It also includes a brief introduction to the SEEKER system which has been developed.

  7. Accurate And Efficient Crawling The Deep Web: Surfacing Hidden Value

    OpenAIRE

    Suneet Kumar; Anuj Kumar Yadav; Rakesh Bharti; Rani Choudhary

    2011-01-01

    Searching Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant web-pages, there are various applications which target whole websites instead of single web-pages. For example, companies are represented by websites, not by individual web-pages. To answer queries targeted at websites, web directories are an established solution. In this paper, we introduce a novel focused website crawler t...

  8. SearchResultFinder: federated search made easy

    NARCIS (Netherlands)

    Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd

    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up

  9. SearchResultFinder: federated search made easy

    NARCIS (Netherlands)

    Trieschnigg, Dolf; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd

    2013-01-01

    Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up dete

  10. 基于超链接引导和链接图分析的主题搜索引擎%Research on Focused Search Engine Based on Hyperlink Induced and Web Structure

    Institute of Scientific and Technical Information of China (English)

    唐苏; 刘循

    2011-01-01

    Focused search engine is a tool designed to query information on a particular subject or theme information. Considering the advantages and disadvantages of current focused search engine technologies, put forward the IPageRank - IND algorithm that combining the hyperlink - induced technology based on text-inspired with the PageRank algorithm based on web structure analysis to improve the accuracy of relativity judgment and the coverage of focused resources research, and classifying the web page by sub-topic in order to retrieve efficiently. Then, experiment with a search engine to build,to compare the algorithm with several other algorithms,see the advantage of IPageRank-IND algorithm is obvious.%主题搜索引擎足专为查询某一学科或主题信息而出现的查询工具.针对目前各种主题搜索引擎在主题搜索上的优缺点,提出将基于文字内容启发的超链接引导技术与基于 Web 链接图的 PageRank 算法相结合的 IPageRank-IND 算法,以提高链接相关度判断的准确性和主题资源搜索的覆盖率,并将网页按照 VSM 算法进行内容相关度判断和自动分类,从而提高检索效率.最后构建一个搜索引擎进行实验,通过比较该算法与其他几种算法的实验结果,能够看到 IPageRank-IND 算法的优势是明显的.

  11. Discovering Authorities and Hubs in Different Topological Web Graph Structures.

    Science.gov (United States)

    Meghabghab, George

    2002-01-01

    Discussion of citation analysis on the Web considers Web hyperlinks as a source to analyze citations. Topics include basic graph theory applied to Web pages, including matrices, linear algebra, and Web topology; and hubs and authorities, including a search technique called HITS (Hyperlink Induced Topic Search). (Author/LRW)

  12. Characteristics of scientific web publications

    DEFF Research Database (Denmark)

    Thorlund Jepsen, Erik; Seiden, Piet; Ingwersen, Peter Emil Rerup

    2004-01-01

    Because of the increasing presence of scientific publications on the Web, combined with the existing difficulties in easily verifying and retrieving these publications, research on techniques and methods for retrieval of scientific Web publications is called for. In this article, we report on the......Vista and AllTheWeb retrieved a higher degree of accessible scientific content than Google. Because of the search engine cutoffs of accessible URLs, the feasibility of using search engine output for Web content analysis is also discussed....

  13. 传统搜索引擎和语义搜索引擎在Web2.0搜索中的差异性%Differences Between Common Search Engine and Semantic Search Engine in Web2.0 Searching

    Institute of Scientific and Technical Information of China (English)

    赵夷平

    2010-01-01

    文章通过对传统搜索引擎和语义搜索引擎在用户创造内容搜索、社会网络搜索和个性化搜索3个方面的对比,展现了两种不同类型搜索引擎在Web2.0环境下提供搜索服务的差异性,为搜索引擎未来的发展提供参考.

  14. An Improved Approach to perform Crawling and avoid Duplicate Web Pages

    Directory of Open Access Journals (Sweden)

    Dhiraj Khurana

    2012-06-01

    Full Text Available When a web search is performed it includes many duplicate web pages or the websites. It means we can get number of similar pages at different web servers. We are proposing a Web Crawling Approach to Detect and avoid Duplicate or Near Duplicate WebPages. In this proposed work we are presenting a keyword Prioritization based approach to identify the web page over the web. As such pages will beidentified it will optimize the web search.

  15. Search Engine Design and Implementation of Web Data Mining Based on Concept Clustering%基于概念聚类的Web数据挖掘搜索引擎的设计与实现

    Institute of Scientific and Technical Information of China (English)

    刘典型; 刘完芳; 钟钢

    2015-01-01

    For the search of the web data mining,its accuracy depends to the numbers of keywords that user has inputted very much,as well as the agreement of user’s intent and the semantic analysis by search engine,the analysis of the search engine including clustering method based on link and based on concept.In this paper,to overcome the defect of the clustering method based on link,through using the method of clustering based on concept,starting from the concept and the storage method of bipartite graph,designed and implemented a personalized search engine of web data mining,its superiority is verified.%针对Web数据挖掘的搜索过程,其准确度很大程度取决于用户输入的关键词的数量,以及搜索引擎对关键词的语义的解析与用户原意的吻合度,而搜索引擎对关键词的解析,包括基于链接的聚类方法和基于概念的聚类方法。本文克服基于链接的聚类方法的缺陷,采用基于概念聚类的方法,从二分图的概念和存储方法入手,设计和实现了个性化的Web数据挖掘搜索引擎,并验证了其优越性。

  16. Semantic Web Mining: Benefits, Challenges and Opportunities

    Directory of Open Access Journals (Sweden)

    Syeda Farha Shazmeen, Etyala Ramyasree

    2012-12-01

    Full Text Available Semantic Web Mining aims at combining the two areas Semantic Web and Web Mining by using semantics to improve mining and using mining to create semantics. Web Mining aims at discovering insights about the meaning of Web resources and their usage In Semantic Web, the semantics information is presented by the relation with others and is recorded by RDF. RDF which is semantic web technology that can be utilized to build efficient and scalable systems for Cloud. The Semantic Web enriches the World Wide Web by machine process able information which supports the user in his tasks, and also helps the users to get the exact search result .In this paper; we discuss the interplay of the Semantic Web with Web Mining, list out the benefits. Challenges, opportunities of the Semantic web are discussed.

  17. Next-Gen Search Engines

    Science.gov (United States)

    Gupta, Amardeep

    2005-01-01

    Current search engines--even the constantly surprising Google--seem unable to leap the next big barrier in search: the trillions of bytes of dynamically generated data created by individual web sites around the world, or what some researchers call the "deep web." The challenge now is not information overload, but information overlook.…

  18. Evaluating aggregated search using interleaving

    NARCIS (Netherlands)

    A. Chuklin; A. Schuth; K. Hofmann; P. Serdyukov; M. de Rijke

    2013-01-01

    A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a

  19. Evaluating aggregated search using interleaving

    NARCIS (Netherlands)

    Chuklin, A.; Schuth, A.; Hofmann, K.; Serdyukov, P.; de Rijke, M.

    2013-01-01

    A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a

  20. A Search Engine Features Comparison.

    Science.gov (United States)

    Vorndran, Gerald

    Until recently, the World Wide Web (WWW) public access search engines have not included many of the advanced commands, options, and features commonly available with the for-profit online database user interfaces, such as DIALOG. This study evaluates the features and characteristics common to both types of search interfaces, examines the Web search…

  1. Secondary Retrieval of Web Academic Information Search Results Based on Concept Lattice%基于概念格的Web学术信息搜索结果的二次检索

    Institute of Scientific and Technical Information of China (English)

    宋绍成; 高俊峰

    2012-01-01

    According to the randomness and distractive nature of Web academic information search results,this article presents a concept lattice-based retrieval algorithm which traverses the search results.The article organizes and clusters the academic information searched out by the retrieval users for the first time,and forms the Hasse diagram.On this basis,a secondary retrieval is performed.In case that the number of the retrieval results is excessively large,help the users reduce the scope of retrieval so as to search out the required information more accurately.%针对Web学术信息搜索结果的无序性和纷杂性,提出一种遍历搜索结果概念格检索算法,将检索用户第一次检出的学术信息组织和聚类并形成Hasse图,以此为基础,进行二次检索,在搜索结果数目过于庞大的情况下,帮助用户缩小查找范围,更准确地检索出所需内容。

  2. Myanmar Language Search Engine

    Directory of Open Access Journals (Sweden)

    Pann Yu Mon

    2011-03-01

    Full Text Available With the enormous growth of the World Wide Web, search engines play a critical role in retrieving information from the borderless Web. Although many search engines are available for the major languages, but they are not much proficient for the less computerized languages including Myanmar. The main reason is that those search engines are not considering the specific features of those languages. A search engine which capable of searching the Web documents written in those languages is highly needed, especially when more and more Web sites are coming up with localized content in multiple languages. In this study, the design and the architecture of language specific search engine for Myanmar language is proposed. The main feature of the system are, (1 it can search the multiple encodings of the Myanmar Web page, (2 the system is designed to comply with the specific features of the Myanmar language. Finally the experiment has been done to prove whether it meets the design requirements.

  3. Federated search in the wild: the combined power of over a hundred search engines

    NARCIS (Netherlands)

    Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Dolf; Hiemstra, Djoerd

    2012-01-01

    Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federa

  4. Federated Search in the Wild: the combined power of over a hundred search engines

    NARCIS (Netherlands)

    Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd

    2012-01-01

    Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for

  5. 维、哈、柯多文种搜索引擎中web文本分类的研究%Study on Web Document Classification of Uyghur, Kazak, Kirgiz Multi-lingual Search Engine

    Institute of Scientific and Technical Information of China (English)

    海丽且木·艾沙; 维尼拉·木沙江

    2011-01-01

    研究维、哈、柯多文种搜索引擎中web文本分类问题.根据维、哈、柯Web文本具有结构信息的特点,提出分类系统框架,采用基于改进的KNN的Web文本分类方法,并结合具体实验在对数据进行预处理的基础上实现了改进的KNN分类算法.实验表明,改进的KNN方法在维吾尔文Web文本分类中能够获得较好的分类效果.%This paper studies the problems of Uyghur, Kazak, Kirgiz Multi-lingual search engine web document classification. According to the structure of Uyghur, Kazak, Kirgiz web text, a system frame is provided, a Web document classification algorithm based on improved KNN method presented , based on data preprocess, improved KNN method is implemented. The result of research indicates that the impact of the new method is better.

  6. 基于Web数据挖掘的个性化搜索引擎的应用和发展趋势%Application and Development Trend of Personalized Search Engine Based on Web Data Mining

    Institute of Scientific and Technical Information of China (English)

    王丽; 曹家琏

    2009-01-01

    Web数据挖掘是将数据挖掘技术和理论应用于对www资源进行挖掘的一个新兴的研究领域.论述Web数据挖掘的发展现状、发展趋势以及将来可能的研究方向.并简单介绍个性化搜索引擎的一些情况,论述Web数据挖掘在个性化搜索引擎中的应用.%Web data mining is a new developing research field in which data mining technology and application of the theory apply to be-ing excavated to www resources. Describe current situation of the development, development trend and possible research direction in thefuture that Web data excavate, and introduce some situations of the individualized search engine briefly, Elaborates the application of Webdata mining in the individualized search engine.

  7. A Novel Method for Bilingual Web Page Mining Via Search Engines%基于搜索引擎的双语混合网页识别新方法

    Institute of Scientific and Technical Information of China (English)

    冯艳卉; 洪宇; 颜振祥; 姚建民; 朱巧明

    2011-01-01

    A new approach has been developed for acquiring bilingual web pages from the result pages of search engines, which is composed of two challenging tasks. The first task is to detect web records embedded in the result pages automatically via a clustering method of a sample page. Identifying these useful records through the clustering method allows the generation of highly effective features for the next task which is high-quality bilingual web page acquisition. The task of high-quality bilingual web page acquisition is assumed as a classification problem. One advantage of our approach is that it is independent of the search engine and the domain. The test is based on 2 516 records extracted from six search engines automatically and annotated manually, which gets a high precision of 81.3% and a recall of 94.93%. The experimental results indicate that our approach is very effective.%该文提出了一种从搜索引擎返回的结果网页中获取双语网页的新方法,该方法分为两个任务.第一个任务是自动地检测并收集搜索引擎返回的结果网页中的数据记录.该步骤通过聚类的方法识别出有用的记录摘要并且为下一个任务即高质量双语混合网页的验证及其荻取提供有效特征.该文中把双语混合网页的验证看作是有效的分类问题,该方法不依赖于特定领域和搜索引擎.基于从搜索引擎收集并经过人工标注的2516务检索结果记录,该文提出的方法取得了81.3%的精确率和94.93%的召回率.

  8. Design and Implementation of Domain based Semantic Hidden Web Crawler

    OpenAIRE

    Manvi; Bhatia, Komal Kumar; Dixit, Ashutosh

    2015-01-01

    Web is a wide term which mainly consists of surface web and hidden web. One can easily access the surface web using traditional web crawlers, but they are not able to crawl the hidden portion of the web. These traditional crawlers retrieve contents from web pages, which are linked by hyperlinks ignoring the information hidden behind form pages, which cannot be extracted using simple hyperlink structure. Thus, they ignore large amount of data hidden behind search forms. This paper emphasizes o...

  9. Evaluative Measures of Search Engines

    Directory of Open Access Journals (Sweden)

    Jitendra Nath Singh

    2012-03-01

    Full Text Available The ability to search and retrieve information from the web efficiently and effectively is great challenge of search engine. Information retrieval on the Web is very different from retrieval in traditional indexed databases because it’s hyper-linked character, the heterogeneity of document types and authoring styles. Thus, since Web retrieval is substantially different from information retrieval, new or revised evaluative measures are required to assess retrieval performance using search engines. In this paper we suggested a number of evaluative measures to evaluate the effectiveness of search engines. The motivation behind each of these measures is presented, along with their descriptions and definitions.

  10. Factsheets Web Application

    Energy Technology Data Exchange (ETDEWEB)

    VIGIL,FRANK; REEDER,ROXANA G.

    2000-10-30

    The Factsheets web application was conceived out of the requirement to create, update, publish, and maintain a web site with dynamic research and development (R and D) content. Before creating the site, a requirements discovery process was done in order to accurately capture the purpose and functionality of the site. One of the high priority requirements for the site would be that no specialized training in web page authoring would be necessary. All functions of uploading, creation, and editing of factsheets needed to be accomplished by entering data directly into web form screens generated by the application. Another important requirement of the site was to allow for access to the factsheet web pages and data via the internal Sandia Restricted Network and Sandia Open Network based on the status of the input data. Important to the owners of the web site would be to allow the published factsheets to be accessible to all personnel within the department whether or not the sheets had completed the formal Review and Approval (R and A) process. Once the factsheets had gone through the formal review and approval process, they could then be published both internally and externally based on their individual publication status. An extended requirement and feature of the site would be to provide a keyword search capability to search through the factsheets. Also, since the site currently resides on both the internal and external networks, it would need to be registered with the Sandia search engines in order to allow access to the content of the site by the search engines. To date, all of the above requirements and features have been created and implemented in the Factsheet web application. These have been accomplished by the use of flat text databases, which are discussed in greater detail later in this paper.

  11. Ramakrishnan: Semantics on the Web

    Data.gov (United States)

    National Aeronautics and Space Administration — It is becoming increasingly clear that the next generation of web search and advertising will rely on a deeper understanding of user intent and task modeling, and a...

  12. How to Search the Internet Archive Without Indexing It

    DEFF Research Database (Denmark)

    Kanhabua, Nattiya; Kemkes, Philipp; Nejdl, Wolfgang

    2016-01-01

    search results to the WayBack Machine; thus al- lowing keyword search on the Internet Archive without processing and indexing its raw content. Our system complements existing web archive search tools through a user interface, which comes close to the functionalities of modern web search engines (e......Significant parts of our cultural heritage are produced on the Web in recent years. While the easy accessibility to the current Web is a good baseline, optimal access to the past of the Web faces several challenges. This includes dealing with large-scale web archive collections, as well as lacking...... of usage logs, which contain implicit human feedback most relevant for today’s web search. In this paper, we propose an entity-oriented search system to support retrieval and analysis processes on web archives. We use Bing, searching the current Web, to retrieve a ranked list of results, and we link our...

  13. With News Search Engines

    Science.gov (United States)

    Gunn, Holly

    2005-01-01

    Although there are many news search engines on the Web, finding the news items one wants can be challenging. Choosing appropriate search terms is one of the biggest challenges. Unless one has seen the article that one is seeking, it is often difficult to select words that were used in the headline or text of the article. The limited archives of…

  14. ElasticSearch cookbook

    CERN Document Server

    Paro, Alberto

    2015-01-01

    If you are a developer who implements ElasticSearch in your web applications and want to sharpen your understanding of the core elements and applications, this is the book for you. It is assumed that you've got working knowledge of JSON and, if you want to extend ElasticSearch, of Java and related technologies.

  15. Exploring the academic invisible web

    OpenAIRE

    Lewandowski, Dirk

    2006-01-01

    The Invisible Web is often discussed in the academic context, where its contents (mainly in the form of databases) are of great importance. But this discussion is mainly based on some seminal research done by Sherman and Price (2001) and Bergman (2001), respectively. We focus on the types of Invisible Web content relevant for academics and the improvements made by search engines to deal with these content types. In addition, we question the volume of the Invisible Web as stated by Bergman. Ou...

  16. The Knowledge Base Based on Ontology Semantic WEB Expanded Search Method%基于本体的知识库语义WEB扩展搜索方法研究

    Institute of Scientific and Technical Information of China (English)

    袁辉; 李延香

    2013-01-01

    As the foundation of knowledge management, the function of knowledge base is very important. Based on the reasoning and keyword matching combination of search method is the search for knowledge base of the commonly used method, but by the user expression is not clear, the term lack, etc., influenced the retrieval efficiency, and is not very good, people to the knowledge base of information retrieval needs can't all come true. By introducing the semantic web ontology technology and query expansion technology can greatly improve the efficiency of retrieval, satisfy people's demand information retrieval. In this paper, the knowledge base based on ontology semantic WEB expanded search methods are discussed and research.%  作为知识管理的基础,知识库的作用十分重要。基于推理和关键字匹配相结合的搜索方法是目前对知识库进行搜索的常用方法,不过受用户表达不清楚、检索词匮乏等方面影响,检索效率并不是很好,人们对知识库信息检索的各种需求无法全部实现。通过引入语义网本体技术与查询扩展技术能够大幅提升检索效率,满足人们的信息检索需求。本文对基于本体的知识库语义WEB扩展搜索方法进行了探讨和研究。

  17. Curriculum Optimization of Web Design and Making Basing on Search Engine Optimization%基于搜索引擎优化的《网页设计与制作》课程改革

    Institute of Scientific and Technical Information of China (English)

    关晓惠; 周志敏

    2013-01-01

      Now more and more companies begin to pay attention to economic benefits of their website .They will demand more talented persons engaged web design and making .The curriculum of Web Design and Making is a core professional course of computer specialty.It is used to train students’ methods and techniques of designing and making websites .In or-der to improve the ranking of website in search engines of Google or Baidu , this course should be reformed on the idea of search engine optimization.In this paper, existing problems of the course are analyze and approaches on how to optimize processing of implementation in the light of search engine optimization are also propose .Finally teaching mode of JAVA curriculum is also explored.%  《网页设计与制作》作为计算机专业的核心专业课程,不但要培养学生的网页设计与制作的方法和技巧,还要依据搜索引擎优化的理念来设计和制作网页,以提高网站或网页在各类搜索引擎中的排名。通过分析《网页设计与制作》课程教学中存在的问题,并根据搜索引擎优化的技术优化该课程内容,采用三阶段递进的教学模式,以培养出符合社会需求的网页设计与制作专业人才。

  18. Collaborative web hosting challenges and research directions

    CERN Document Server

    Ahmed, Reaz

    2014-01-01

    This brief presents a peer-to-peer (P2P) web-hosting infrastructure (named pWeb) that can transform networked, home-entertainment devices into lightweight collaborating Web servers for persistently storing and serving multimedia and web content. The issues addressed include ensuring content availability, Plexus routing and indexing, naming schemes, web ID, collaborative web search, network architecture and content indexing. In pWeb, user-generated voluminous multimedia content is proactively uploaded to a nearby network location (preferably within the same LAN or at least, within the same ISP)

  19. Supporting complex search tasks

    DEFF Research Database (Denmark)

    Gäde, Maria; Hall, Mark; Huurdeman, Hugo

    2015-01-01

    There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks, is fragme......There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks......, and recommendations, and supporting exploratory search to sensemaking and analytics, UI and UX design pose an overconstrained challenge. How do we know that our approach is any good? Supporting complex search task requires new collaborations across the whole field of IR, and the proposed workshop will bring together...

  20. Mechanism design of position auction with advertiser reputation for Web-search%考虑广告商信誉的搜索引擎排位拍卖的机制设计

    Institute of Scientific and Technical Information of China (English)

    汪定伟

    2011-01-01

    Due to untrue advertisements appear in search results, the position auction mechanism with more attention on profit was criticized by public in China. To improve and normalize keyword search auction, we recommend a new position auction mechanism with consideration on advertiser reputations.Based on classical VCG auction, the winner determination approach is proposed. The computational formulas for alI advertising slots are deduced and presented. Comparing with original position auction without reputations, the new mechanism may bring a little of profit loss to the web-search merchants in the same bidding levels of advertisers. However, in view of long-term perspective, the reputation raise of search engine with the new auction mechanism will bring much more daily browsing amounts It certainly will make great profit to the web-search merchants.%排位拍卖是通用搜索引擎营运商的主要赢利手段.由于不良广告乘虚而入,排位拍卖过度趋利的拍卖机制受到了公众的广泛质疑.为规范搜索引擎营运行为,提出一种考虑广告商信誉的新的排位拍卖的机制.基于经典的VCG拍卖的机制提出了胜标的确定方法,推导并证明了各个广告排位的价格计算公式.通过与原有的不考虑广告商信誉的排位拍卖的收益对比说明,虽然在相同竞价水平上,考虑广告商信誉会给搜索引擎营运商带来少量的经济损失,但从长远看,搜索引擎的信誉提高带来访问量的增加,势必会给搜索引擎营运商带来更高的利润.