Rajashekar, TB
1998-01-01
The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.
The Evolution of Web Searching.
Green, David
2000-01-01
Explores the interrelation between Web publishing and information retrieval technologies and lists new approaches to Web indexing and searching. Highlights include Web directories; search engines; portalisation; Internet service providers; browser providers; meta search engines; popularity based analysis; natural language searching; links-based…
Dore, Kelly L; Reiter, Harold I; Kreuger, Sharyn; Norman, Geoffrey R
2017-12-01
In re-examining the paper "CASPer, an online pre-interview screen for personal/professional characteristics: prediction of national licensure scores" published in AHSE (22(2), 327-336), we recognized two errors of interpretation.
Process-oriented semantic web search
Tran, DT
2011-01-01
The book is composed of two main parts. The first part is a general study of Semantic Web Search. The second part specifically focuses on the use of semantics throughout the search process, compiling a big picture of Process-oriented Semantic Web Search from different pieces of work that target specific aspects of the process.In particular, this book provides a rigorous account of the concepts and technologies proposed for searching resources and semantic data on the Semantic Web. To collate the various approaches and to better understand what the notion of Semantic Web Search entails, this bo
Web-page Prediction for Domain Specific Web-search using Boolean Bit Mask
Sinha, Sukanta; Duttagupta, Rana; Mukhopadhyay, Debajyoti
2012-01-01
Search Engine is a Web-page retrieval tool. Nowadays Web searchers utilize their time using an efficient search engine. To improve the performance of the search engine, we are introducing a unique mechanism which will give Web searchers more prominent search results. In this paper, we are going to discuss a domain specific Web search prototype which will generate the predicted Web-page list for user given search string using Boolean bit mask.
Jackson, Joe; Gilstrap, Donald L.
1999-01-01
Addresses the implications of the new Web metalanguage XML for searching on the World Wide Web and considers the future of XML on the Web. Compared to HTML, XML is more concerned with structure of data than documents, and these data structures should prove conducive to precise, context rich searching. (Author/LRW)
Personalizing Web Search based on User Profile
Utage, Sharyu; Ahire, Vijaya
2016-01-01
Web Search engine is most widely used for information retrieval from World Wide Web. These Web Search engines help user to find most useful information. When different users Searches for same information, search engine provide same result without understanding who is submitted that query. Personalized web search it is search technique for proving useful result. This paper models preference of users as hierarchical user profiles. a framework is proposed called UPS. It generalizes profile and m...
Measuring Personalization of Web Search
DEFF Research Database (Denmark)
Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi
2013-01-01
are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions...... as a result of searching with a logged in account and the IP address of the searching user. Our results are a first step towards understanding the extent and effects of personalization on Web search engines today....
Sexual information seeking on web search engines.
Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles
2004-02-01
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
U.S. Environmental Protection Agency — The Chemical Search Web Utility is an intuitive web application that allows the public to easily find the chemical that they are interested in using, and which...
Tales from the Field: Search Strategies Applied in Web Searching
Directory of Open Access Journals (Sweden)
Soohyung Joo
2010-08-01
Full Text Available In their web search processes users apply multiple types of search strategies, which consist of different search tactics. This paper identifies eight types of information search strategies with associated cases based on sequences of search tactics during the information search process. Thirty-one participants representing the general public were recruited for this study. Search logs and verbal protocols offered rich data for the identification of different types of search strategies. Based on the findings, the authors further discuss how to enhance web-based information retrieval (IR systems to support each type of search strategy.
Multitasking Web Searching and Implications for Design.
Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda
2003-01-01
Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…
A Novel Personalized Web Search Model
Institute of Scientific and Technical Information of China (English)
ZHU Zhengyu; XU Jingqiu; TIAN Yunyan; REN Xiang
2007-01-01
A novel personalized Web search model is proposed.The new system, as a middleware between a user and a Web search engine, is set up on the client machine. It can learn a user's preference implicitly and then generate the user profile automatically. When the user inputs query keywords, the system can automatically generate a few personalized expansion words by computing the term-term associations according to the current user profile, and then these words together with the query keywords are submitted to a popular search engine such as Yahoo or Google.These expansion words help to express accurately the user's search intention. The new Web search model can make a common search engine personalized, that is, the search engine can return different search results to different users who input the same keywords. The experimental results show the feasibility and applicability of the presented work.
Needle Custom Search: Recall-oriented search on the Web using semantic annotations
Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon; de Rijke, Maarten; Kenter, Tom; de Vries, A.P.; Zhai, Chen Xiang; de Jong, Franciska M.G.; Radinsky, Kira; Hofmann, Katja
Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency
Needle Custom Search : Recall-oriented search on the web using semantic annotations
Kaptein, Rianne; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; van den Broek, Egon L.
2014-01-01
Web search engines are optimized for early precision, which makes it difficult to perform recall-oriented tasks using these search engines. In this article, we present our tool Needle Custom Search. This tool exploits semantic annotations of Web search results and, thereby, increase the efficiency
Nuclear expert web search and crawler algorithm
International Nuclear Information System (INIS)
Reis, Thiago; Barroso, Antonio C.O.; Baptista, Benedito Filho D.
2013-01-01
In this paper we present preliminary research on web search and crawling algorithm applied specifically to nuclear-related web information. We designed a web-based nuclear-oriented expert system guided by a web crawler algorithm and a neural network able to search and retrieve nuclear-related hyper textual web information in autonomous and massive fashion. Preliminary experimental results shows a retrieval precision of 80% for web pages related to any nuclear theme and a retrieval precision of 72% for web pages related only to nuclear power theme. (author)
Nuclear expert web search and crawler algorithm
Energy Technology Data Exchange (ETDEWEB)
Reis, Thiago; Barroso, Antonio C.O.; Baptista, Benedito Filho D., E-mail: thiagoreis@usp.br, E-mail: barroso@ipen.br, E-mail: bdbfilho@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)
2013-07-01
In this paper we present preliminary research on web search and crawling algorithm applied specifically to nuclear-related web information. We designed a web-based nuclear-oriented expert system guided by a web crawler algorithm and a neural network able to search and retrieve nuclear-related hyper textual web information in autonomous and massive fashion. Preliminary experimental results shows a retrieval precision of 80% for web pages related to any nuclear theme and a retrieval precision of 72% for web pages related only to nuclear power theme. (author)
Web-based information search and retrieval: effects of strategy use and age on search success.
Stronge, Aideen J; Rogers, Wendy A; Fisk, Arthur D
2006-01-01
The purpose of this study was to investigate the relationship between strategy use and search success on the World Wide Web (i.e., the Web) for experienced Web users. An additional goal was to extend understanding of how the age of the searcher may influence strategy use. Current investigations of information search and retrieval on the Web have provided an incomplete picture of Web strategy use because participants have not been given the opportunity to demonstrate their knowledge of Web strategies while also searching for information on the Web. Using both behavioral and knowledge-engineering methods, we investigated searching behavior and system knowledge for 16 younger adults (M = 20.88 years of age) and 16 older adults (M = 67.88 years). Older adults were less successful than younger adults in finding correct answers to the search tasks. Knowledge engineering revealed that the age-related effect resulted from ineffective search strategies and amount of Web experience rather than age per se. Our analysis led to the development of a decision-action diagram representing search behavior for both age groups. Older adults had more difficulty than younger adults when searching for information on the Web. However, this difficulty was related to the selection of inefficient search strategies, which may have been attributable to a lack of knowledge about available Web search strategies. Actual or potential applications of this research include training Web users to search more effectively and suggestions to improve the design of search engines.
FedWeb Greatest Hits: Presenting the New Test Collection for Federated Web Search
Demeester, Thomas; Trieschnigg, Rudolf Berend; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd
This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as
Information Diversity in Web Search
Liu, Jiahui
2009-01-01
The web is a rich and diverse information source with incredible amounts of information about all kinds of subjects in various forms. This information source affords great opportunity to build systems that support users in their work and everyday lives. To help users explore information on the web, web search systems should find information that…
IMPROVING PERSONALIZED WEB SEARCH USING BOOKSHELF DATA STRUCTURE
Directory of Open Access Journals (Sweden)
S.K. Jayanthi
2012-10-01
Full Text Available Search engines are playing a vital role in retrieving relevant information for the web user. In this research work a user profile based web search is proposed. So the web user from different domain may receive different set of results. The main challenging work is to provide relevant results at the right level of reading difficulty. Estimating user expertise and re-ranking the results are the main aspects of this paper. The retrieved results are arranged in Bookshelf Data Structure for easy access. Better presentation of search results hence increases the usability of web search engines significantly in visual mode.
Tjin-Kam-Jet, Kien
2013-01-01
The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in
The Use of Web Search Engines in Information Science Research.
Bar-Ilan, Judit
2004-01-01
Reviews the literature on the use of Web search engines in information science research, including: ways users interact with Web search engines; social aspects of searching; structure and dynamic nature of the Web; link analysis; other bibliometric applications; characterizing information on the Web; search engine evaluation and improvement; and…
Overview of the TREC 2014 Federated Web Search Track
Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Zhou, Ke; Hiemstra, Djoerd
2014-01-01
The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in
Resource Selection for Federated Search on the Web
Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
A publicly available dataset for federated search reflecting a real web environment has long been bsent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web
Adding a visualization feature to web search engines: it's time.
Wong, Pak Chung
2008-01-01
It's widely recognized that all Web search engines today are almost identical in presentation layout and behavior. In fact, the same presentation approach has been applied to depicting search engine results pages (SERPs) since the first Web search engine launched in 1993. In this Visualization Viewpoints article, I propose to add a visualization feature to Web search engines and suggest that the new addition can improve search engines' performance and capabilities, which in turn lead to better Web search technology.
Overview of the TREC 2013 Federated Web Search Track
Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Hiemstra, Djoerd
2014-01-01
The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb
Meta-Search Utilizing Evolitionary Recommendation: A Web Search Architecture Proposal
Czech Academy of Sciences Publication Activity Database
Húsek, Dušan; Keyhanipour, A.; Krömer, P.; Moshiri, B.; Owais, S.; Snášel, V.
2008-01-01
Roč. 33, - (2008), s. 189-200 ISSN 1870-4069 Institutional research plan: CEZ:AV0Z10300504 Keywords : web search * meta-search engine * intelligent re-ranking * ordered weighted averaging * Boolean search queries optimizing Subject RIV: IN - Informatics, Computer Science
Deep web search: an overview and roadmap
Tjin-Kam-Jet, Kien; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
2011-01-01
We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these
Schlitzkus, Lisa L; Schenarts, Paul J; Schenarts, Kimberly D
2013-01-01
Hosting a reception for prospective interns the evening before the interview has become a well-established expectation. It is thought that these initial impressions significantly influence the ranking process. Despite these well-held beliefs, there has been a paucity of studies exploring the preinterview reception. A survey tool was created and piloted to ensure validity. The survey was then administered to a fourth-year class of allopathic medical students immediately after interviews but before Match Day. A university, teaching hospital. Fourth-year allopathic medical students. The response rate was 100% (n = 69). Ninety-six percent of programs hosted an event. Although these events were minimally stressful (86%), the same percent felt that not attending would limit their knowledge of the program, and 66% felt that it would negatively affect their application. Forty percent believe this event to be extremely important to residency programs in selecting interns. Ninety-five percent are attended by residents only, and approximately half were at a casual restaurant. Most applicants (97%) never paid for their own meal, and 69% felt that if they did, it would leave a negative impression of the program. Candidates believe the preinterview reception is important in the selection process, that failing to attend would negatively affect their application, and provides insight about the program. Alcohol is often provided but rarely has a negative effect. Applicants prefer an informal setting with unfettered interactions with the residents. © 2013 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
World Wide Web Metaphors for Search Mission Data
Norris, Jeffrey S.; Wallick, Michael N.; Joswig, Joseph C.; Powell, Mark W.; Torres, Recaredo J.; Mittman, David S.; Abramyan, Lucy; Crockett, Thomas M.; Shams, Khawaja S.; Fox, Jason M.;
2010-01-01
A software program that searches and browses mission data emulates a Web browser, containing standard meta - phors for Web browsing. By taking advantage of back-end URLs, users may save and share search states. Also, since a Web interface is familiar to users, training time is reduced. Familiar back and forward buttons move through a local search history. A refresh/reload button regenerates a query, and loads in any new data. URLs can be constructed to save search results. Adding context to the current search is also handled through a familiar Web metaphor. The query is constructed by clicking on hyperlinks that represent new components to the search query. The selection of a link appears to the user as a page change; the choice of links changes to represent the updated search and the results are filtered by the new criteria. Selecting a navigation link changes the current query and also the URL that is associated with it. The back button can be used to return to the previous search state. This software is part of the MSLICE release, which was written in Java. It will run on any current Windows, Macintosh, or Linux system.
Overview of the TREC 2014 Federated Web Search Track
Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Zhou, Ke; Hiemstra, Djoerd
2014-01-01
The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in FedWeb 2014, and we additionally introduced the task of vertical selection. Other new aspects are the required link between the Resource Selection and Results Merging, and the importance of diversi...
Research Proposal for Distributed Deep Web Search
Tjin-Kam-Jet, Kien
2010-01-01
This proposal identifies two main problems related to deep web search, and proposes a step by step solution for each of them. The first problem is about searching deep web content by means of a simple free-text interface (with just one input field, instead of a complex interface with many input
Uncovering Web search strategies in South African higher education
Directory of Open Access Journals (Sweden)
Surika Civilcharran
2016-11-01
Full Text Available Background: In spite of the enormous amount of information available on the Web and the fact that search engines are continuously evolving to enhance the search experience, students are nevertheless faced with the difficulty of effectively retrieving information. It is, therefore, imperative for the interaction between students and search tools to be understood and search strategies to be identified, in order to promote successful information retrieval. Objectives: This study identifies the Web search strategies used by postgraduate students and forms part of a wider study into information retrieval strategies used by postgraduate students at the University of KwaZulu-Natal (UKZN, Pietermaritzburg campus, South Africa. Method: Largely underpinned by Thatcher’s cognitive search strategies, the mixed-methods approach was utilised for this study, in which questionnaires were employed in Phase 1 and structured interviews in Phase 2. This article reports and reflects on the findings of Phase 2, which focus on identifying the Web search strategies employed by postgraduate students. The Phase 1 results were reported in Civilcharran, Hughes and Maharaj (2015. Results: Findings reveal the Web search strategies used for academic information retrieval. In spite of easy access to the invisible Web and the advent of meta-search engines, the use of Web search engines still remains the preferred search tool. The UKZN online library databases and especially the UKZN online library, Online Public Access Catalogue system, are being underutilised. Conclusion: Being ranked in the top three percent of the world’s universities, UKZN is investing in search tools that are not being used to their full potential. This evidence suggests an urgent need for students to be trained in Web searching and to have a greater exposure to a variety of search tools. This article is intended to further contribute to the design of undergraduate training programmes in order to deal
Overview of the TREC 2013 federated web search track
Demeester, Thomas; Trieschnigg, D; Nguyen, D; Hiemstra, D
2013-01-01
The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb 2013. The focus was on basic challenges in federated search: (1) resource selection, and (2) results merging. After an overview of the provided data collection and the relevance judgments for the ...
Changes in users' Web search performance after ten years ...
African Journals Online (AJOL)
The changes in users' Web search performance using search engines over ten years was investigated in this study. Matched data obtained from samples in 2000 and 2010 were used for the comparative analysis. The patterns of Web search engine use suggested a dominance in using a particular search engine. Statistical ...
A grammar checker based on web searching
Directory of Open Access Journals (Sweden)
Joaquim Moré
2006-05-01
Full Text Available This paper presents an English grammar and style checker for non-native English speakers. The main characteristic of this checker is the use of an Internet search engine. As the number of web pages written in English is immense, the system hypothesises that a piece of text not found on the Web is probably badly written. The system also hypothesises that the Web will provide examples of how the content of the text segment can be expressed in a grammatically correct and idiomatic way. Thus, when the checker warns the user about the odd nature of a text segment, the Internet engine searches for contexts that can help the user decide whether he/she should correct the segment or not. By means of a search engine, the checker also suggests use of other expressions that appear on the Web more often than the expression he/she actually wrote.
Semantic Search of Web Services
Hao, Ke
2013-01-01
This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…
Use of Web Search Engines and Personalisation in Information Searching for Educational Purposes
Salehi, Sara; Du, Jia Tina; Ashman, Helen
2018-01-01
Introduction: Students increasingly depend on Web search for educational purposes. This causes concerns among education providers as some evidence indicates that in higher education, the disadvantages of Web search and personalised information are not justified by the benefits. Method: One hundred and twenty university students were surveyed about…
Quantifying retrieval bias in Web archive search
Samar, Thaer; Traub, Myriam C.; van Ossenbruggen, Jacco; Hardman, Lynda; de Vries, Arjen P.
2018-01-01
A Web archive usually contains multiple versions of documents crawled from the Web at different points in time. One possible way for users to access a Web archive is through full-text search systems. However, previous studies have shown that these systems can induce a bias, known as the
The Impact of User Knowledge on Web Search Satisfaction
Fadhilah M. Yamin; T. Ramayah
2011-01-01
Problem statement: Searching on the web is a tedious process as it requires knowledge and skills on what and how to search. What to search is basically, the core of the searching activity as it represents the need of the searcher. How to search is related to the knowledge on how the facilities available on the web can be utilized in order to achieve the needs. Search satisfaction is the level of measurement that describes the achievement of the searcher towards his/her information needs. Appr...
More Effective Web Search Using Bigrams and Trigrams
Peter Vamplew; Vishv Malhotra; David Johnson
2006-01-01
This paper investigates the effectiveness of quoted bigrams and trigrams as query terms to target web search. Prior research in this area has largely focused on static corpora each containing only a few million documents, and has reported mixed (usually negative) results. We investigate the bigram/trigram extraction problem and present an extraction algorithm that shows promising results when applied to real-time web search. We also present a prototype augmented search software package that c...
Raising Reliability of Web Search Tool Research through Replication and Chaos Theory
Nicholson, Scott
1999-01-01
Because the World Wide Web is a dynamic collection of information, the Web search tools (or "search engines") that index the Web are dynamic. Traditional information retrieval evaluation techniques may not provide reliable results when applied to the Web search tools. This study is the result of ten replications of the classic 1996 Ding and Marchionini Web search tool research. It explores the effects that replication can have on transforming unreliable results from one iteration into replica...
The effect of query complexity on Web searching results
Directory of Open Access Journals (Sweden)
B.J. Jansen
2000-01-01
Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.
The invisible Web uncovering information sources search engines can't see
Sherman, Chris
2001-01-01
Enormous expanses of the Internet are unreachable with standard web search engines. This book provides the key to finding these hidden resources by identifying how to uncover and use invisible web resources. Mapping the invisible Web, when and how to use it, assessing the validity of the information, and the future of Web searching are topics covered in detail. Only 16 percent of Net-based information can be located using a general search engine. The other 84 percent is what is referred to as the invisible Web-made up of information stored in databases. Unlike pages on the visible Web, informa
AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES
Directory of Open Access Journals (Sweden)
Cezar VASILESCU
2010-01-01
Full Text Available The Internet becomes for most of us a daily used instrument, for professional or personal reasons. We even do not remember the times when a computer and a broadband connection were luxury items. More and more people are relying on the complicated web network to find the needed information.This paper presents an overview of Internet search related issues, upon search engines and describes the parties and the basic mechanism that is embedded in a search for web based information resources. Also presents ways to increase the efficiency of web searches, through a better understanding of what search engines ignore at websites content.
She, Hsiao-Ching; Cheng, Meng-Tzu; Li, Ta-Wei; Wang, Chia-Yu; Chiu, Hsin-Tien; Lee, Pei-Zon; Chou, Wen-Chi; Chuang, Ming-Hua
2012-01-01
This study investigates the effect of Web-based Chemistry Problem-Solving, with the attributes of Web-searching and problem-solving scaffolds, on undergraduate students' problem-solving task performance. In addition, the nature and extent of Web-searching strategies students used and its correlation with task performance and domain knowledge also…
A study of medical and health queries to web search engines.
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
2004-03-01
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Generating crop calendars with Web search data
International Nuclear Information System (INIS)
Van der Velde, Marijn; See, Linda; Fritz, Steffen; Khabarov, Nikolay; Obersteiner, Michael; Verheijen, Frank G A
2012-01-01
This paper demonstrates the potential of using Web search volumes for generating crop specific planting and harvesting dates in the USA integrating climatic, social and technological factors affecting crop calendars. Using Google Insights for Search, clear peaks in volume occur at times of planting and harvest at the national level, which were used to derive corn specific planting and harvesting dates at a weekly resolution. Disaggregated to state level, search volumes for corn planting generally are in agreement with planting dates from a global crop calendar dataset. However, harvest dates were less discriminatory at the state level, indicating that peaks in search volume may be blurred by broader searches on harvest as a time of cultural events. The timing of other agricultural activities such as purchase of seed and response to weed and pest infestation was also investigated. These results highlight the future potential of using Web search data to derive planting dates in countries where the data are sparse or unreliable, once sufficient search volumes are realized, as well as the potential for monitoring in real time the response of farmers to climate change over the coming decades. Other potential applications of search volume data of relevance to agronomy are also discussed. (letter)
Seasonal Web Search Query Selection for Influenza-Like Illness (ILI) Estimation
DEFF Research Database (Denmark)
Hansen, Niels Dalum; Mølbak, Kåre; Cox, Ingemar Johansson
2017-01-01
Inuenza-like illness (ILI) estimation from web search data is an important web analytics task. The basic idea is to use the frequencies of queries in web search logs that are correlated with past ILI activity as features when estimating current ILI activity. It has been noted that since inuenza...
Drexel at TREC 2014 Federated Web Search Track
2014-11-01
of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly[5]. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
Social Search: A Taxonomy of, and a User-Centred Approach to, Social Web Search
McDonnell, Michael; Shiri, Ali
2011-01-01
Purpose: The purpose of this paper is to introduce the notion of social search as a new concept, drawing upon the patterns of web search behaviour. It aims to: define social search; present a taxonomy of social search; and propose a user-centred social search method. Design/methodology/approach: A mixed method approach was adopted to investigate…
Web Spam, Social Propaganda and the Evolution of Search Engine Rankings
Metaxas, Panagiotis Takis
Search Engines have greatly influenced the way we experience the web. Since the early days of the web, users have been relying on them to get informed and make decisions. When the web was relatively small, web directories were built and maintained using human experts to screen and categorize pages according to their characteristics. By the mid 1990's, however, it was apparent that the human expert model of categorizing web pages does not scale. The first search engines appeared and they have been evolving ever since, taking over the role that web directories used to play.
Research on Web Search Behavior: How Online Query Data Inform Social Psychology.
Lai, Kaisheng; Lee, Yan Xin; Chen, Hao; Yu, Rongjun
2017-10-01
The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.
Utility of Web search query data in testing theoretical assumptions about mephedrone.
Kapitány-Fövény, Máté; Demetrovics, Zsolt
2017-05-01
With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Filistea Naude
2010-08-01
Full Text Available This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The results of this study show that academics have indeed accepted the open Web as a useful information resource and Web search engines as retrieval tools when seeking information for academic and research work. The majority of respondents used the open Web and Web search engines on a daily or weekly basis to source academic and research information. The main obstacles presented by using the open Web and Web search engines included lack of time to search and browse the Web, information overload, poor network speed and the slow downloading speed of webpages.
Directory of Open Access Journals (Sweden)
Filistea Naude
2010-12-01
Full Text Available This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The results of this study show that academics have indeed accepted the open Web as a useful information resource and Web search engines as retrieval tools when seeking information for academic and research work. The majority of respondents used the open Web and Web search engines on a daily or weekly basis to source academic and research information. The main obstacles presented by using the open Web and Web search engines included lack of time to search and browse the Web, information overload, poor network speed and the slow downloading speed of webpages.
Adding a Visualization Feature to Web Search Engines: It’s Time
Energy Technology Data Exchange (ETDEWEB)
Wong, Pak C.
2008-11-11
Since the first world wide web (WWW) search engine quietly entered our lives in 1994, the “information need” behind web searching has rapidly grown into a multi-billion dollar business that dominates the internet landscape, drives e-commerce traffic, propels global economy, and affects the lives of the whole human race. Today’s search engines are faster, smarter, and more powerful than those released just a few years ago. With the vast investment pouring into research and development by leading web technology providers and the intense emotion behind corporate slogans such as “win the web” or “take back the web,” I can’t help but ask why are we still using the very same “text-only” interface that was used 13 years ago to browse our search engine results pages (SERPs)? Why has the SERP interface technology lagged so far behind in the web evolution when the corresponding search technology has advanced so rapidly? In this article I explore some current SERP interface issues, suggest a simple but practical visual-based interface design approach, and argue why a visual approach can be a strong candidate for tomorrow’s SERP interface.
A review of the reporting of web searching to identify studies for Cochrane systematic reviews.
Briscoe, Simon
2018-03-01
The literature searches that are used to identify studies for inclusion in a systematic review should be comprehensively reported. This ensures that the literature searches are transparent and reproducible, which is important for assessing the strengths and weaknesses of a systematic review and re-running the literature searches when conducting an update review. Web searching using search engines and the websites of topically relevant organisations is sometimes used as a supplementary literature search method. Previous research has shown that the reporting of web searching in systematic reviews often lacks important details and is thus not transparent or reproducible. Useful details to report about web searching include the name of the search engine or website, the URL, the date searched, the search strategy, and the number of results. This study reviews the reporting of web searching to identify studies for Cochrane systematic reviews published in the 6-month period August 2016 to January 2017 (n = 423). Of these reviews, 61 reviews reported using web searching using a search engine or website as a literature search method. In the majority of reviews, the reporting of web searching was found to lack essential detail for ensuring transparency and reproducibility, such as the search terms. Recommendations are made on how to improve the reporting of web searching in Cochrane systematic reviews. Copyright © 2017 John Wiley & Sons, Ltd.
Lazonder, Adrianus W.
2005-01-01
This study compared Pairs of students with Single students in web search tasks. The underlying hypothesis was that peer-to-peer collaboration encourages students to articulate their thoughts, which in turn has a facilitative effect on the regulation of the search process as well as search outcomes.
Discovering How Students Search a Library Web Site: A Usability Case Study.
Augustine, Susan; Greene, Courtney
2002-01-01
Discusses results of a usability study at the University of Illinois Chicago that investigated whether Internet search engines have influenced the way students search library Web sites. Results show students use the Web site's internal search engine rather than navigating through the pages; have difficulty interpreting library terminology; and…
Collaborative Web Search Who, What, Where, When, and Why
Morris, Meredith Ringel
2009-01-01
Today, Web search is treated as a solitary experience. Web browsers and search engines are typically designed to support a single user, working alone. However, collaboration on information-seeking tasks is actually commonplace. Students work together to complete homework assignments, friends seek information about joint entertainment opportunities, family members jointly plan vacation travel, and colleagues jointly conduct research for their projects. As improved networking technologies and the rise of social media simplify the process of remote collaboration, and large, novel display form-fac
Categorization of web pages - Performance enhancement to search engine
Digital Repository Service at National Institute of Oceanography (India)
Lakshminarayana, S.
of Artificial Intelligence, Volume III. Los Altos, CA.: William Kaufmann. pp 1-74. 18. Brin, S. & Page, L. (1998). The anatomy of a large scale hyper-textual web search engine. In Proceedings of the seventh World Wide Web conference, Brisbane, Australia. 19...
Web Feet Guide to Search Engines: Finding It on the Net.
Web Feet, 2001
2001-01-01
This guide to search engines for the World Wide Web discusses selecting the right search engine; interpreting search results; major search engines; online tutorials and guides; search engines for kids; specialized search tools for various subjects; and other specialized engines and gateways. (LRW)
Improving Web Page Retrieval using Search Context from Clicked Domain Names
Li, R.
Search context is a crucial factor that helps to understand a user’s information need in ad-hoc Web page retrieval. A query log of a search engine contains rich information on issued queries and their corresponding clicked Web pages. The clicked data implies its relevance to the query and can be
ONTOLOGY BASED MEANINGFUL SEARCH USING SEMANTIC WEB AND NATURAL LANGUAGE PROCESSING TECHNIQUES
Directory of Open Access Journals (Sweden)
K. Palaniammal
2013-10-01
Full Text Available The semantic web extends the current World Wide Web by adding facilities for the machine understood description of meaning. The ontology based search model is used to enhance efficiency and accuracy of information retrieval. Ontology is the core technology for the semantic web and this mechanism for representing formal and shared domain descriptions. In this paper, we proposed ontology based meaningful search using semantic web and Natural Language Processing (NLP techniques in the educational domain. First we build the educational ontology then we present the semantic search system. The search model consisting three parts which are embedding spell-check, finding synonyms using WordNet API and querying ontology using SPARQL language. The results are both sensitive to spell check and synonymous context. This paper provides more accurate results and the complete details for the selected field in a single page.
Curating the Web: Building a Google Custom Search Engine for the Arts
Hennesy, Cody; Bowman, John
2008-01-01
Google's first foray onto the web made search simple and results relevant. With its Co-op platform, Google has taken another step toward dramatically increasing the relevancy of search results, further adapting the World Wide Web to local needs. Google Custom Search Engine, a tool on the Co-op platform, puts one in control of his or her own search…
Dynamics of a macroscopic model characterizing mutualism of search engines and web sites
Wang, Yuanshi; Wu, Hong
2006-05-01
We present a model to describe the mutualism relationship between search engines and web sites. In the model, search engines and web sites benefit from each other while the search engines are derived products of the web sites and cannot survive independently. Our goal is to show strategies for the search engines to survive in the internet market. From mathematical analysis of the model, we show that mutualism does not always result in survival. We show various conditions under which the search engines would tend to extinction, persist or grow explosively. Then by the conditions, we deduce a series of strategies for the search engines to survive in the internet market. We present conditions under which the initial number of consumers of the search engines has little contribution to their persistence, which is in agreement with the results in previous works. Furthermore, we show novel conditions under which the initial value plays an important role in the persistence of the search engines and deduce new strategies. We also give suggestions for the web sites to cooperate with the search engines in order to form a win-win situation.
A World Wide Web Region-Based Image Search Engine
DEFF Research Database (Denmark)
Kompatsiaris, Ioannis; Triantafyllou, Evangelia; Strintzis, Michael G.
2001-01-01
In this paper the development of an intelligent image content-based search engine for the World Wide Web is presented. This system will offer a new form of media representation and access of content available in WWW. Information Web Crawlers continuously traverse the Internet and collect images...
Virtual Reference Services through Web Search Engines: Study of Academic Libraries in Pakistan
Directory of Open Access Journals (Sweden)
Rubia Khan
2017-03-01
Full Text Available Web search engines (WSE are powerful and popular tools in the field of information service management. This study is an attempt to examine the impact and usefulness of web search engines in providing virtual reference services (VRS within academic libraries in Pakistan. The study also attempts to investigate the relevant expertise and skills of library professionals in providing digital reference services (DRS efficiently using web search engines. Methodology used in this study is quantitative in nature. The data was collected from fifty public and private sector universities in Pakistan using a structured questionnaire. Microsoft Excel and SPSS were used for data analysis. The study concludes that web search engines are commonly used by librarians to help users (especially research scholars by providing digital reference services. The study also finds a positive correlation between use of web search engines and quality of digital reference services provided to library users. It is concluded that although search engines have increased the expectations of users and are really big competitors to a library’s reference desk, they are however not an alternative to reference service. Findings reveal that search engines pose numerous challenges for librarians and the study also attempts to bring together possible remedial measures. This study is useful for library professionals to understand the importance of search engines in providing VRS. The study also provides an intellectual comparison among different search engines, their capabilities, limitations, challenges and opportunities to provide VRS effectively in libraries.
A web search on environmental topics: what is the role of ranking?
Covolo, Loredana; Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto
2013-12-01
Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is very crucial. The aim of this study was to investigate whether the presentation of information on some environmental health topics differed among various search engines, assuming that the most reliable information should come from institutional Web sites. Five search engines were used: Google, Yahoo!, Bing, Ask, and AOL. The following topics were searched in combination with the word "health": "nuclear energy," "electromagnetic waves," "air pollution," "waste," and "radon." For each topic three key words were used. The first 30 search results for each query were considered. The ranking variability among the search engines and the type of search results were analyzed for each topic and for each key word. The ranking of institutional Web sites was given particular consideration. Variable results were obtained when surfing the Internet on different environmental health topics. Multivariate logistic regression analysis showed that, when searching for radon and air pollution topics, it is more likely to find institutional Web sites in the first 10 positions compared with nuclear power (odds ratio=3.4, 95% confidence interval 2.1-5.4 and odds ratio=2.9, 95% confidence interval 1.8-4.7, respectively) and also when using Google compared with Bing (odds ratio=3.1, 95% confidence interval 1.9-5.1). The increasing use of online information could play an important role in forming opinions. Web users should become more aware of the importance of finding reliable information, and health institutions should be
How Google Web Search copes with very similar documents
W. Mettrop (Wouter); P. Nieuwenhuysen; H. Smulders
2006-01-01
textabstractA significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that
Predicting consumer behavior with Web search.
Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J
2010-10-12
Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.
The HMMER Web Server for Protein Sequence Similarity Search.
Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D
2017-12-08
Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
What Snippets Say About Pages in Federated Web Search
Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd; Hou, Yuexian; Nie, Jian-Yun; Sun, Le; Wang, Bo; Zhang, Peng
2012-01-01
What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research
A Web Search on Environmental Topics: What Is the Role of Ranking?
Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto
2013-01-01
Abstract Background: Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is very crucial. The aim of this study was to investigate whether the presentation of information on some environmental health topics differed among various search engines, assuming that the most reliable information should come from institutional Web sites. Materials and Methods: Five search engines were used: Google, Yahoo!, Bing, Ask, and AOL. The following topics were searched in combination with the word “health”: “nuclear energy,” “electromagnetic waves,” “air pollution,” “waste,” and “radon.” For each topic three key words were used. The first 30 search results for each query were considered. The ranking variability among the search engines and the type of search results were analyzed for each topic and for each key word. The ranking of institutional Web sites was given particular consideration. Results: Variable results were obtained when surfing the Internet on different environmental health topics. Multivariate logistic regression analysis showed that, when searching for radon and air pollution topics, it is more likely to find institutional Web sites in the first 10 positions compared with nuclear power (odds ratio=3.4, 95% confidence interval 2.1–5.4 and odds ratio=2.9, 95% confidence interval 1.8–4.7, respectively) and also when using Google compared with Bing (odds ratio=3.1, 95% confidence interval 1.9–5.1). Conclusions: The increasing use of online information could play an important role in forming opinions. Web users should become
Search Techniques for the Web of Things: A Taxonomy and Survey
Zhou, Yuchao; De, Suparna; Wang, Wei; Moessner, Klaus
2016-01-01
The Web of Things aims to make physical world objects and their data accessible through standard Web technologies to enable intelligent applications and sophisticated data analytics. Due to the amount and heterogeneity of the data, it is challenging to perform data analysis directly; especially when the data is captured from a large number of distributed sources. However, the size and scope of the data can be reduced and narrowed down with search techniques, so that only the most relevant and useful data items are selected according to the application requirements. Search is fundamental to the Web of Things while challenging by nature in this context, e.g., mobility of the objects, opportunistic presence and sensing, continuous data streams with changing spatial and temporal properties, efficient indexing for historical and real time data. The research community has developed numerous techniques and methods to tackle these problems as reported by a large body of literature in the last few years. A comprehensive investigation of the current and past studies is necessary to gain a clear view of the research landscape and to identify promising future directions. This survey reviews the state-of-the-art search methods for the Web of Things, which are classified according to three different viewpoints: basic principles, data/knowledge representation, and contents being searched. Experiences and lessons learned from the existing work and some EU research projects related to Web of Things are discussed, and an outlook to the future research is presented. PMID:27128918
From people to entities new semantic search paradigms for the web
Demartini, G
2014-01-01
The exponential growth of digital information available in companies and on the Web creates the need for search tools that can respond to the most sophisticated information needs. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web documents. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic.In the first part of the book, we propose a model for expert finding based on the well-consolidated vector space model for Information Retrieval and inv
The “I’m Feeling Lucky Syndrome”: Teacher-Candidates’ Knowledge of Web Searching Strategies
Directory of Open Access Journals (Sweden)
Corinne Laverty
2008-06-01
Full Text Available The need for web literacy has become increasingly important with the exponential growth of learning materials on the web that are freely accessible to educators. Teachers need the skills to locate these tools and also the ability to teach their students web search strategies and evaluation of websites so they can effectively explore the web by themselves. This study examined the web searching strategies of 253 teachers-in-training using both a survey (247 participants and live screen capture with think aloud audio recording (6 participants. The results present a picture of the strategic, syntactic, and evaluative search abilities of these students that librarians and faculty can use to plan how instruction can target information skill deficits in university student populations.
Search Techniques for the Web of Things: A Taxonomy and Survey
Directory of Open Access Journals (Sweden)
Yuchao Zhou
2016-04-01
Full Text Available The Web of Things aims to make physical world objects and their data accessible through standard Web technologies to enable intelligent applications and sophisticated data analytics. Due to the amount and heterogeneity of the data, it is challenging to perform data analysis directly; especially when the data is captured from a large number of distributed sources. However, the size and scope of the data can be reduced and narrowed down with search techniques, so that only the most relevant and useful data items are selected according to the application requirements. Search is fundamental to the Web of Things while challenging by nature in this context, e.g., mobility of the objects, opportunistic presence and sensing, continuous data streams with changing spatial and temporal properties, efficient indexing for historical and real time data. The research community has developed numerous techniques and methods to tackle these problems as reported by a large body of literature in the last few years. A comprehensive investigation of the current and past studies is necessary to gain a clear view of the research landscape and to identify promising future directions. This survey reviews the state-of-the-art search methods for the Web of Things, which are classified according to three different viewpoints: basic principles, data/knowledge representation, and contents being searched. Experiences and lessons learned from the existing work and some EU research projects related to Web of Things are discussed, and an outlook to the future research is presented.
Minimalist instruction for learning to search the World Wide Web
Lazonder, Adrianus W.
2001-01-01
This study examined the efficacy of minimalist instruction to develop self-regulatory skills involved in Web searching. Two versions of minimalist self-regulatory skill instruction were compared to a control group that was merely taught procedural skills to operate the search engine. Acquired skills
Williams, Sarah C.
2010-01-01
The purpose of this study was to investigate how federated search engines are incorporated into the Web sites of libraries in the Association of Research Libraries. In 2009, information was gathered for each library in the Association of Research Libraries with a federated search engine. This included the name of the federated search service and…
Key word placing in Web page body text to increase visibility to search engines
Directory of Open Access Journals (Sweden)
W. T. Kritzinger
2007-11-01
Full Text Available The growth of the World Wide Web has spawned a wide variety of new information sources, which has also left users with the daunting task of determining which sources are valid. Many users rely on the Web as an information source because of the low cost of information retrieval. It is also claimed that the Web has evolved into a powerful business tool. Examples include highly popular business services such as Amazon.com and Kalahari.net. It is estimated that around 80% of users utilize search engines to locate information on the Internet. This, by implication, places emphasis on the underlying importance of Web pages being listed on search engines indices. Empirical evidence that the placement of key words in certain areas of the body text will have an influence on the Web sites' visibility to search engines could not be found in the literature. The result of two experiments indicated that key words should be concentrated towards the top, and diluted towards the bottom of a Web page to increase visibility. However, care should be taken in terms of key word density, to prevent search engine algorithms from raising the spam alarm.
Manually Classifying User Search Queries on an Academic Library Web Site
Chapman, Suzanne; Desai, Shevon; Hagedorn, Kat; Varnum, Ken; Mishra, Sonali; Piacentine, Julie
2013-01-01
The University of Michigan Library wanted to learn more about the kinds of searches its users were conducting through the "one search" search box on the Library Web site. Library staff conducted two investigations. A preliminary investigation in 2011 involved the manual review of the 100 most frequently occurring queries conducted…
Directory of Open Access Journals (Sweden)
Alireza Isfandiyari Moghadam
2010-03-01
Full Text Available The present investigation concerns evaluation, comparison and analysis of search options existing within web-based meta-search engines. 64 meta-search engines were identified. 19 meta-search engines that were free, accessible and compatible with the objectives of the present study were selected. An author’s constructed check list was used for data collection. Findings indicated that all meta-search engines studied used the AND operator, phrase search, number of results displayed setting, previous search query storage and help tutorials. Nevertheless, none of them demonstrated any search options for hypertext searching and displaying the size of the pages searched. 94.7% support features such as truncation, keywords in title and URL search and text summary display. The checklist used in the study could serve as a model for investigating search options in search engines, digital libraries and other internet search tools.
Search, Read and Write: An Inquiry into Web Accessibility for People with Dyslexia.
Berget, Gerd; Herstad, Jo; Sandnes, Frode Eika
2016-01-01
Universal design in context of digitalisation has become an integrated part of international conventions and national legislations. A goal is to make the Web accessible for people of different genders, ages, backgrounds, cultures and physical, sensory and cognitive abilities. Political demands for universally designed solutions have raised questions about how it is achieved in practice. Developers, designers and legislators have looked towards the Web Content Accessibility Guidelines (WCAG) for answers. WCAG 2.0 has become the de facto standard for universal design on the Web. Some of the guidelines are directed at the general population, while others are targeted at more specific user groups, such as the visually impaired or hearing impaired. Issues related to cognitive impairments such as dyslexia receive less attention, although dyslexia is prevalent in at least 5-10% of the population. Navigation and search are two common ways of using the Web. However, while navigation has received a fair amount of attention, search systems are not explicitly included, although search has become an important part of people's daily routines. This paper discusses WCAG in the context of dyslexia for the Web in general and search user interfaces specifically. Although certain guidelines address topics that affect dyslexia, WCAG does not seem to fully accommodate users with dyslexia.
Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach
Directory of Open Access Journals (Sweden)
Hong Wang
2014-12-01
Full Text Available To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML form or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are scarce, hard to get and requires tediousmanual work, while unlabeled HTML forms are abundant and easy to obtain. In this research, we consider the plausibility of using both labeled and unlabeled data to train better models to identify search interfaces more effectively. We present a semi-supervised co-training ensemble learning approach using both neural networks and decision trees to deal with the search interface identification problem. We show that the proposed model outperforms previous methods using only labeled data. We also show that adding unlabeled data improves the effectiveness of the proposed model.
CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions
Pires, Rafael; Goltzsche, David; Mokhtar, Sonia Ben; Bouchenak, Sara; Boutet, Antoine; Felber, Pascal; Kapitza, Rüdiger; Pasin, Marcelo; Schiavoni, Valerio
2018-01-01
By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user re-identification attacks, while others lack scala...
A Systematic Understanding of Successful Web Searches in Information-Based Tasks
Zhou, Mingming
2013-01-01
The purpose of this study is to research how Chinese university students solve information-based problems. With the Search Performance Index as the measure of search success, participants were divided into high, medium and low-performing groups. Based on their web search logs, these three groups were compared along five dimensions of the search…
Teaching AI Search Algorithms in a Web-Based Educational System
Grivokostopoulou, Foteini; Hatzilygeroudis, Ioannis
2013-01-01
In this paper, we present a way of teaching AI search algorithms in a web-based adaptive educational system. Teaching is based on interactive examples and exercises. Interactive examples, which use visualized animations to present AI search algorithms in a step-by-step way with explanations, are used to make learning more attractive. Practice…
Efficient Top-k Locality Search for Co-located Spatial Web Objects
DEFF Research Database (Denmark)
Qu, Qiang; Liu, Siyuan; Yang, Bin
2014-01-01
In step with the web being used widely by mobile users, user location is becoming an essential signal in services, including local intent search. Given a large set of spatial web objects consisting of a geographical location and a textual description (e.g., online business directory entries of re...
Sreenivasulu, V.
2000-01-01
Internet Granthalaya urges world wide advocates and targets at the task of creating a new search engine and dedicated browseer. Internet Granthalaya may be the ultimate search engine exclusively dedicated for every library use to search and organize the world wide web libary resources
Snippet-based relevance predictions for federated web search
Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd
How well can the relevance of a page be predicted, purely based on snippets? This would be highly useful in a Federated Web Search setting where caching large amounts of result snippets is more feasible than caching entire pages. The experiments reported in this paper make use of result snippets and
A Web Search on Environmental Topics: What Is the Role of Ranking?
Covolo, Loredana; Filisetti, Barbara; Mascaretti, Silvia; Limina, Rosa Maria; Gelatti, Umberto
2013-01-01
Background: Although the Internet is easy to use, the mechanisms and logic behind a Web search are often unknown. Reliable information can be obtained, but it may not be visible as the Web site is not located in the first positions of search results. The possible risks of adverse health effects arising from environmental hazards are issues of increasing public interest, and therefore the information about these risks, particularly on topics for which there is no scientific evidence, is ver...
Burden of neurological diseases in the US revealed by web searches.
Directory of Open Access Journals (Sweden)
Ricardo Baeza-Yates
Full Text Available Analyzing the disease-related web searches of Internet users provides insight into the interests of the general population as well as the healthcare industry, which can be used to shape health care policies.We analyzed the searches related to neurological diseases and drugs used in neurology using the most popular search engines in the US, Google and Bing/Yahoo.We found that the most frequently searched diseases were common diseases such as dementia or Attention Deficit/Hyperactivity Disorder (ADHD, as well as medium frequency diseases with high social impact such as Parkinson's disease, MS and ALS. The most frequently searched CNS drugs were generic drugs used for pain, followed by sleep disorders, dementia, ADHD, stroke and Parkinson's disease. Regarding the interests of the healthcare industry, ADHD, Alzheimer's disease, MS, ALS, meningitis, and hypersomnia received the higher advertising bids for neurological diseases, while painkillers and drugs for neuropathic pain, drugs for dementia or insomnia, and triptans had the highest advertising bidding prices.Web searches reflect the interest of people and the healthcare industry, and are based either on the frequency or social impact of the disease.
Dropout Rates and Response Times of an Occupation Search Tree in a Web Survey
Directory of Open Access Journals (Sweden)
Tijdens Kea
2014-03-01
Full Text Available Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database. They are hardly ever used, but this may change due to technical advancements. This article evaluates a three-step search tree with 1,700 occupational titles, used in the 2010 multilingual WageIndicator web survey for UK, Belgium and Netherlands (22,990 observations. Dropout rates are high; in Step 1 due to unemployed respondents judging the question not to be adequate, and in Step 3 due to search tree item length. Median response times are substantial due to search tree item length, dropout in the next step and invalid occupations ticked. Overall the validity of the occupation data is rather good, 1.7-7.5% of the respondents completing the search tree have ticked an invalid occupation.
The Little Engines That Could: Modeling the Performance of World Wide Web Search Engines
Eric T. Bradlow; David C. Schmittlein
2000-01-01
This research examines the ability of six popular Web search engines, individually and collectively, to locate Web pages containing common marketing/management phrases. We propose and validate a model for search engine performance that is able to represent key patterns of coverage and overlap among the engines. The model enables us to estimate the typical additional benefit of using multiple search engines, depending on the particular set of engines being considered. It also provides an estim...
The effects of link format and screen location on visual search of web pages.
Ling, Jonathan; Van Schaik, Paul
2004-06-22
Navigation of web pages is of critical importance to the usability of web-based systems such as the World Wide Web and intranets. The primary means of navigation is through the use of hyperlinks. However, few studies have examined the impact of the presentation format of these links on visual search. The present study used a two-factor mixed measures design to investigate whether there was an effect of link format (plain text, underlined, bold, or bold and underlined) upon speed and accuracy of visual search and subjective measures in both the navigation and content areas of web pages. An effect of link format on speed of visual search for both hits and correct rejections was found. This effect was observed in the navigation and the content areas. Link format did not influence accuracy in either screen location. Participants showed highest preference for links that were in bold and underlined, regardless of screen area. These results are discussed in the context of visual search processes and design recommendations are given.
Spiders and Worms and Crawlers, Oh My: Searching on the World Wide Web.
Eagan, Ann; Bender, Laura
Searching on the world wide web can be confusing. A myriad of search engines exist, often with little or no documentation, and many of these search engines work differently from the standard search engines people are accustomed to using. Intended for librarians, this paper defines search engines, directories, spiders, and robots, and covers basics…
State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search
Directory of Open Access Journals (Sweden)
Kehinde Agbele
2012-01-01
Full Text Available People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin’s principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.
WEB-server for search of a periodicity in amino acid and nucleotide sequences
E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.
2017-12-01
A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.
LigSearch: a knowledge-based web server to identify likely ligands for a protein target
Energy Technology Data Exchange (ETDEWEB)
Beer, Tjaart A. P. de; Laskowski, Roman A. [European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD (United Kingdom); Duban, Mark-Eugene [Northwestern University Feinberg School of Medicine, Chicago, Illinois (United States); Chan, A. W. Edith [University College London, London WC1E 6BT (United Kingdom); Anderson, Wayne F. [Northwestern University Feinberg School of Medicine, Chicago, Illinois (United States); Thornton, Janet M., E-mail: thornton@ebi.ac.uk [European Bioinformatics Institute (EMBL–EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD (United Kingdom)
2013-12-01
LigSearch is a web server for identifying ligands likely to bind to a given protein. Identifying which ligands might bind to a protein before crystallization trials could provide a significant saving in time and resources. LigSearch, a web server aimed at predicting ligands that might bind to and stabilize a given protein, has been developed. Using a protein sequence and/or structure, the system searches against a variety of databases, combining available knowledge, and provides a clustered and ranked output of possible ligands. LigSearch can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/LigSearch.
Directory of Open Access Journals (Sweden)
Dan Wu
2016-03-01
Full Text Available Purpose: This study explores how search motivation and context influence mobile Web search behaviors. Design/methodology/approach: We studied 30 experienced mobile Web users via questionnaires, semi-structured interviews, and an online diary tool that participants used to record their daily search activities. SQLite Developer was used to extract data from the users' phone logs for correlation analysis in Statistical Product and Service Solutions (SPSS. Findings: One quarter of mobile search sessions were driven by two or more search motivations. It was especially difficult to distinguish curiosity from time killing in particular user reporting. Multi-dimensional contexts and motivations influenced mobile search behaviors, and among the context dimensions, gender, place, activities they engaged in while searching, task importance, portal, and interpersonal relations (whether accompanied or alone when searching correlated with each other. Research limitations: The sample was comprised entirely of college students, so our findings may not generalize to other populations. More participants and longer experimental duration will improve the accuracy and objectivity of the research. Practical implications: Motivation analysis and search context recognition can help mobile service providers design applications and services for particular mobile contexts and usages. Originality/value: Most current research focuses on specific contexts, such as studies on place, or other contextual influences on mobile search, and lacks a systematic analysis of mobile search context. Based on analysis of the impact of mobile search motivations and search context on search behaviors, we built a multi-dimensional model of mobile search behaviors.
The Effectiveness of Web Search Engines to Index New Sites from Different Countries
Pirkola, Ari
2009-01-01
Introduction: Investigates how effectively Web search engines index new sites from different countries. The primary interest is whether new sites are indexed equally or whether search engines are biased towards certain countries. If major search engines show biased coverage it can be considered a significant economic and political problem because…
Deep Web Search Interface Identification: A Semi-Supervised Ensemble Approach
Hong Wang; Qingsong Xu; Lifeng Zhou
2014-01-01
To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML) form) or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are scarce, hard to get and requires tediousmanual work, while unlabeled HTML forms are abundant and easy to obtain. In this research, we consider the plausibility of using both labeled and unlabeled data to train better models to...
A semantics-based method for clustering of Chinese web search results
Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong
2014-01-01
Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.
Index Compression and Efficient Query Processing in Large Web Search Engines
Ding, Shuai
2013-01-01
The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…
What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images.
Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W
2015-06-01
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
Research on the optimization strategy of web search engine based on data mining
Chen, Ronghua
2018-04-01
With the wide application of search engines, web site information has become an important way for people to obtain information. People have found that they are growing in an increasingly explosive manner. Web site information is verydifficult to find the information they need, and now the search engine can not meet the need, so there is an urgent need for the network to provide website personalized information service, data mining technology for this new challenge is to find a breakthrough. In order to improve people's accuracy of finding information from websites, a website search engine optimization strategy based on data mining is proposed, and verified by website search engine optimization experiment. The results show that the proposed strategy improves the accuracy of the people to find information, and reduces the time for people to find information. It has an important practical value.
CWI and TU Delft at TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks
A. Bellogín Kouki (Alejandro); G.G. Gebremeskel (Gebre); J. He (Jiyin); J.J.P. Lin (Jimmy); A. Said (Alan); T. Samar (Thaer); A.P. de Vries (Arjen); J.B.P. Vuurens (Jeroen)
2014-01-01
htmlabstractThis paper provides an overview of the work done at the Centrum Wiskunde & Informatica (CWI) and Delft University of Technology (TU Delft) for different tracks of TREC 2013. We participated in the Contextual Suggestion Track, the Federated Web Search Track, the Knowledge Base
Ramu, Chenna
2003-07-01
SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.
Spatial Search Techniques for Mobile 3D Queries in Sensor Web Environments
Directory of Open Access Journals (Sweden)
James D. Carswell
2013-03-01
Full Text Available Developing mobile geo-information systems for sensor web applications involves technologies that can access linked geographical and semantically related Internet information. Additionally, in tomorrow’s Web 4.0 world, it is envisioned that trillions of inexpensive micro-sensors placed throughout the environment will also become available for discovery based on their unique geo-referenced IP address. Exploring these enormous volumes of disparate heterogeneous data on today’s location and orientation aware smartphones requires context-aware smart applications and services that can deal with “information overload”. 3DQ (Three Dimensional Query is our novel mobile spatial interaction (MSI prototype that acts as a next-generation base for human interaction within such geospatial sensor web environments/urban landscapes. It filters information using “Hidden Query Removal” functionality that intelligently refines the search space by calculating the geometry of a three dimensional visibility shape (Vista space at a user’s current location. This 3D shape then becomes the query “window” in a spatial database for retrieving information on only those objects visible within a user’s actual 3D field-of-view. 3DQ reduces information overload and serves to heighten situation awareness on constrained commercial off-the-shelf devices by providing visibility space searching as a mobile web service. The effects of variations in mobile spatial search techniques in terms of query speed vs. accuracy are evaluated and presented in this paper.
Web search queries can predict stock market volumes.
Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar
2012-01-01
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Web search queries can predict stock market volumes.
Directory of Open Access Journals (Sweden)
Ilaria Bordino
Full Text Available We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Source evaluation of domain experts and novices during Web search
Brand-Gruwel, Saskia; Kammerer, Yvonne; Van Meeuwen, Ludo; van Gog, T.
2017-01-01
Nowadays, almost everyone uses the World Wide Web (WWW) to search for information of any kind. In education, students frequently use the WWW for selecting information to accomplish assignments such as writing an essay or preparing a presentation. The evaluation of sources and information is an
Fu, Linda Y; Zook, Kathleen; Spoehr-Labutta, Zachary; Hu, Pamela; Joseph, Jill G
2016-01-01
Online information can influence attitudes toward vaccination. The aim of the present study was to provide a systematic evaluation of the search engine ranking, quality, and content of Web pages that are critical versus noncritical of human papillomavirus (HPV) vaccination. We identified HPV vaccine-related Web pages with the Google search engine by entering 20 terms. We then assessed each Web page for critical versus noncritical bias and for the following quality indicators: authorship disclosure, source disclosure, attribution of at least one reference, currency, exclusion of testimonial accounts, and readability level less than ninth grade. We also determined Web page comprehensiveness in terms of mention of 14 HPV vaccine-relevant topics. Twenty searches yielded 116 unique Web pages. HPV vaccine-critical Web pages comprised roughly a third of the top, top 5- and top 10-ranking Web pages. The prevalence of HPV vaccine-critical Web pages was higher for queries that included term modifiers in addition to root terms. Compared with noncritical Web pages, Web pages critical of HPV vaccine overall had a lower quality score than those with a noncritical bias (p engine queries despite being of lower quality and less comprehensive than noncritical Web pages. Copyright © 2016 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Dommes, Aurelie; Chevalier, Aline; Rossetti, Marilyne
2010-04-01
This pilot study investigated the age-related differences in searching for information on the World Wide Web with a search engine. 11 older adults (6 men, 5 women; M age=59 yr., SD=2.76, range=55-65 yr.) and 12 younger adults (2 men, 10 women; M=23.7 yr., SD=1.07, range=22-25 yr.) had to conduct six searches differing in complexity, and for which a search method was or was not induced. The results showed that the younger and older participants provided with an induced search method were less flexible than the others and produced fewer new keywords. Moreover, older participants took longer than the younger adults, especially in the complex searches. The younger participants were flexible in the first request and spontaneously produced new keywords (spontaneous flexibility), whereas the older participants only produced new keywords when confronted by impasses (reactive flexibility). Aging may influence web searches, especially the nature of keywords used.
Tillotson, Joy
2003-01-01
Describes a survey that was conducted involving participants in the library instruction program at two Canadian universities in order to describe the characteristics of students receiving instruction in Web searching. Examines criteria for evaluating Web sites, search strategies, use of search engines, and frequency of use. Questionnaire is…
A Webometric Analysis of ISI Medical Journals Using Yahoo, AltaVista, and All the Web Search Engines
Directory of Open Access Journals (Sweden)
Zohreh Zahedi
2010-12-01
Full Text Available The World Wide Web is an important information source for scholarly communications. Examining the inlinks via webometrics studies has attracted particular interests among information researchers. In this study, the number of inlinks to 69 ISI medical journals retrieved by Yahoo, AltaVista, and All The web Search Engines were examined via a comparative and Webometrics study. For data analysis, SPSS software was employed. Findings revealed that British Medical Journal website attracted the most links of all in the three search engines. There is a significant correlation between the number of External links and the ISI impact factor. The most significant correlation in the three search engines exists between external links of Yahoo and AltaVista (100% and the least correlation is found between external links of All The web & the number of pages of AltaVista (0.51. There is no significant difference between the internal links & the number of pages found by the three search engines. But in case of impact factors, significant differences are found between these three search engines. So, the study shows that journals with higher impact factor attract more links to their websites. It also indicates that the three search engines are significantly different in terms of total links, outlinks and web impact factors
Exploration of Web Users' Search Interests through Automatic Subject Categorization of Query Terms.
Pu, Hsiao-tieh; Yang, Chyan; Chuang, Shui-Lung
2001-01-01
Proposes a mechanism that carefully integrates human and machine efforts to explore Web users' search interests. The approach consists of a four-step process: extraction of core terms; construction of subject taxonomy; automatic subject categorization of query terms; and observation of users' search interests. Research findings are proved valuable…
Search Engine Optimization for Flash Best Practices for Using Flash on the Web
Perkins, Todd
2009-01-01
Search Engine Optimization for Flash dispels the myth that Flash-based websites won't show up in a web search by demonstrating exactly what you can do to make your site fully searchable -- no matter how much Flash it contains. You'll learn best practices for using HTML, CSS and JavaScript, as well as SWFObject, for building sites with Flash that will stand tall in search rankings.
Kao, Chia-Pin; Chien, Hui-Min
2017-01-01
This study was conducted to explore the relationships between pre-school educators' conceptions of and approaches to learning by web-searching through Internet Self-efficacy. Based on data from 242 pre-school educators who had prior experience of participating in web-searching in Taiwan for path analyses, it was found in this study that…
Is Internet search better than structured instruction for web-based health education?
Finkelstein, Joseph; Bedra, McKenzie
2013-01-01
Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.
An architecture for diversity-aware search for medical web content.
Denecke, K
2012-01-01
The Web provides a huge source of information, also on medical and health-related issues. In particular the content of medical social media data can be diverse due to the background of an author, the source or the topic. Diversity in this context means that a document covers different aspects of a topic or a topic is described in different ways. In this paper, we introduce an approach that allows to consider the diverse aspects of a search query when providing retrieval results to a user. We introduce a system architecture for a diversity-aware search engine that allows retrieving medical information from the web. The diversity of retrieval results is assessed by calculating diversity measures that rely upon semantic information derived from a mapping to concepts of a medical terminology. Considering these measures, the result set is diversified by ranking more diverse texts higher. The methods and system architecture are implemented in a retrieval engine for medical web content. The diversity measures reflect the diversity of aspects considered in a text and its type of information content. They are used for result presentation, filtering and ranking. In a user evaluation we assess the user satisfaction with an ordering of retrieval results that considers the diversity measures. It is shown through the evaluation that diversity-aware retrieval considering diversity measures in ranking could increase the user satisfaction with retrieval results.
PubMed and beyond: a survey of web tools for searching biomedical literature
Lu, Zhiyong
2011-01-01
The past decade has witnessed the modern advances of high-throughput technology and rapid growth of research capacity in producing large-scale biological data, both of which were concomitant with an exponential growth of biomedical literature. This wealth of scholarly knowledge is of significant importance for researchers in making scientific discoveries and healthcare professionals in managing health-related matters. However, the acquisition of such information is becoming increasingly difficult due to its large volume and rapid growth. In response, the National Center for Biotechnology Information (NCBI) is continuously making changes to its PubMed Web service for improvement. Meanwhile, different entities have devoted themselves to developing Web tools for helping users quickly and efficiently search and retrieve relevant publications. These practices, together with maturity in the field of text mining, have led to an increase in the number and quality of various Web tools that provide comparable literature search service to PubMed. In this study, we review 28 such tools, highlight their respective innovations, compare them to the PubMed system and one another, and discuss directions for future development. Furthermore, we have built a website dedicated to tracking existing systems and future advances in the field of biomedical literature search. Taken together, our work serves information seekers in choosing tools for their needs and service providers and developers in keeping current in the field. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search PMID:21245076
PubMed and beyond: a survey of web tools for searching biomedical literature.
Lu, Zhiyong
2011-01-01
The past decade has witnessed the modern advances of high-throughput technology and rapid growth of research capacity in producing large-scale biological data, both of which were concomitant with an exponential growth of biomedical literature. This wealth of scholarly knowledge is of significant importance for researchers in making scientific discoveries and healthcare professionals in managing health-related matters. However, the acquisition of such information is becoming increasingly difficult due to its large volume and rapid growth. In response, the National Center for Biotechnology Information (NCBI) is continuously making changes to its PubMed Web service for improvement. Meanwhile, different entities have devoted themselves to developing Web tools for helping users quickly and efficiently search and retrieve relevant publications. These practices, together with maturity in the field of text mining, have led to an increase in the number and quality of various Web tools that provide comparable literature search service to PubMed. In this study, we review 28 such tools, highlight their respective innovations, compare them to the PubMed system and one another, and discuss directions for future development. Furthermore, we have built a website dedicated to tracking existing systems and future advances in the field of biomedical literature search. Taken together, our work serves information seekers in choosing tools for their needs and service providers and developers in keeping current in the field. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/search.
REPTREE CLASSIFIER FOR IDENTIFYING LINK SPAM IN WEB SEARCH ENGINES
Directory of Open Access Journals (Sweden)
S.K. Jayanthi
2013-01-01
Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.
Web-Based Search and Plot System for Nuclear Reaction Data
International Nuclear Information System (INIS)
Otuka, N.; Nakagawa, T.; Fukahori, T.; Katakura, J.; Aikawa, M.; Suda, T.; Naito, K.; Korennov, S.; Arai, K.; Noto, H.; Ohnishi, A.; Kato, K.
2005-01-01
A web-based search and plot system for nuclear reaction data has been developed, covering experimental data in EXFOR format and evaluated data in ENDF format. The system is implemented for Linux OS, with Perl and MySQL used for CGI scripts and the database manager, respectively. Two prototypes for experimental and evaluated data are presented
Query transformations and their role in Web searching by the members of the general public
Directory of Open Access Journals (Sweden)
Martin Whittle
2006-01-01
Full Text Available Introduction. This paper reports preliminary research in a primarily experimental study of how the general public search for information on the Web. The focus is on the query transformation patterns that characterise searching. Method. In this work, we have used transaction logs from the Excite search engine to develop methods for analysing query transformations that should aid the analysis of our ongoing experimental work. Our methods involve the use of similarity techniques to link queries with the most similar previous query in a train. The resulting query transformations are represented as a list of codes representing a whole search. Analysis. It is shown how query transformation sequences can be represented as graphical networks and some basic statistical results are shown. A correlation analysis is performed to examine the co-occurrence of Boolean and quotation mark changes with the syntactic changes. Results. A frequency analysis of the occurrence of query transformation codes is presented. The connectivity of graphs obtained from the query transformation is investigated and found to follow an exponential scaling law. The correlation analysis reveals a number of patterns that provide some interesting insights into Web searching by the general public. Conclusion. We have developed analytical methods based on query similarity that can be applied to our current experimental work with volunteer subjects. The results of these will form part of a database with the aim of developing an improved understanding of how the public search the Web.
An assessment of the visibility of MeSH-indexed medical web catalogs through search engines.
Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J
2002-01-01
Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.
Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba
2013-02-01
Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.
Changes in users' mental models of Web search engines after ten ...
African Journals Online (AJOL)
Ward's Cluster analyses including the Pseudo T² Statistical analyses were used to determine the mental model clusters for the seventeen salient design features of Web search engines at each time point. The cubic clustering criterion (CCC) and the dendogram were conducted for each sample to help determine the number ...
Age differences in search of web pages: the effects of link size, link number, and clutter.
Grahame, Michael; Laberge, Jason; Scialfa, Charles T
2004-01-01
Reaction time, eye movements, and errors were measured during visual search of Web pages to determine age-related differences in performance as a function of link size, link number, link location, and clutter. Participants (15 young adults, M = 23 years; 14 older adults, M = 57 years) searched Web pages for target links that varied from trial to trial. During one half of the trials, links were enlarged from 10-point to 12-point font. Target location was distributed among the left, center, and bottom portions of the screen. Clutter was manipulated according to the percentage of used space, including graphics and text, and the number of potentially distracting nontarget links was varied. Increased link size improved performance, whereas increased clutter and links hampered search, especially for older adults. Results also showed that links located in the left region of the page were found most easily. Actual or potential applications of this research include Web site design to increase usability, particularly for older adults.
Mining social media and web searches for disease detection.
Yang, Y Tony; Horneffer, Michael; DiLisio, Nicole
2013-04-28
Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.
Mining social media and web searches for disease detection
Directory of Open Access Journals (Sweden)
Y. Tony Yang
2013-05-01
Full Text Available Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.
Children's Search Engines from an Information Search Process Perspective.
Broch, Elana
2000-01-01
Describes cognitive and affective characteristics of children and teenagers that may affect their Web searching behavior. Reviews literature on children's searching in online public access catalogs (OPACs) and using digital libraries. Profiles two Web search engines. Discusses some of the difficulties children have searching the Web, in the…
Forecasting new product diffusion using both patent citation and web search traffic.
Lee, Won Sang; Choi, Hyo Shin; Sohn, So Young
2018-01-01
Accurate demand forecasting for new technology products is a key factor in the success of a business. We propose a way to forecasting a new product's diffusion through technology diffusion and interest diffusion. Technology diffusion and interest diffusion are measured by the volume of patent citations and web search traffic, respectively. We apply the proposed method to forecast the sales of hybrid cars and industrial robots in the US market. The results show that that technology diffusion, as represented by patent citations, can explain long-term sales for hybrid cars and industrial robots. On the other hand, interest diffusion, as represented by web search traffic, can help to improve the predictability of market sales of hybrid cars in the short-term. However, interest diffusion is difficult to explain the sales of industrial robots due to the different market characteristics. Finding indicates our proposed model can relatively well explain the diffusion of consumer goods.
Omicseq: a web-based search engine for exploring omics datasets
Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng
2017-01-01
Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches.
Svenstrup, Dan; Jørgensen, Henrik L; Winther, Ole
2015-01-01
Physicians and the general public are increasingly using web-based tools to find answers to medical questions. The field of rare diseases is especially challenging and important as shown by the long delay and many mistakes associated with diagnoses. In this paper we review recent initiatives on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description of IBM's Watson system and make a rough comparison between findzebra.com and Watson on subsets of the Doctor's dilemma dataset. The recall@10 and recall@20 (fraction of cases where the correct result appears in top 10 and top 20) for the 56 cases are found to be be 29%, 16%, 27% and 59% and 32%, 18%, 34% and 64%, respectively. Thus, FindZebra has a significantly (p mining tools and social media are some of the areas that hold promise.
A unified architecture for biomedical search engines based on semantic web technologies.
Jalali, Vahid; Matash Borujerdi, Mohammad Reza
2011-04-01
There is a huge growth in the volume of published biomedical research in recent years. Many medical search engines are designed and developed to address the over growing information needs of biomedical experts and curators. Significant progress has been made in utilizing the knowledge embedded in medical ontologies and controlled vocabularies to assist these engines. However, the lack of common architecture for utilized ontologies and overall retrieval process, hampers evaluating different search engines and interoperability between them under unified conditions. In this paper, a unified architecture for medical search engines is introduced. Proposed model contains standard schemas declared in semantic web languages for ontologies and documents used by search engines. Unified models for annotation and retrieval processes are other parts of introduced architecture. A sample search engine is also designed and implemented based on the proposed architecture in this paper. The search engine is evaluated using two test collections and results are reported in terms of precision vs. recall and mean average precision for different approaches used by this search engine.
Directory of Open Access Journals (Sweden)
Maryam Asadi
2015-12-01
Full Text Available Using mixed methods research design, the current study has analyzed Iranian researchers’ information searching behaviour on the Web.Then based on extracted concepts, the model of their information searching behavior was revealed. . Forty-four participants, including academic staff from universities and research centers were recruited for this study selected by purposive sampling. Data were gathered from questionnairs including ten questions and semi-structured interview. Each participant’s memos were analyzed using grounded theory methods adapted from Strauss & Corbin (1998. Results showed that the main objectives of subjects were doing a research, writing a paper, studying, doing assignments, downloading files and acquiring public information in using Web. The most important of learning about how to search and retrieve information were trial and error and get help from friends among the subjects. Information resources are identified by searching in information resources (e.g. search engines, references in papers, and search in Online database… communications facilities & tools (e.g. contact with colleagues, seminars & workshops, social networking..., and information services (e.g. RSS, Alerting, and SDI. Also, Findings indicated that searching by search engines, reviewing references, searching in online databases, and contact with colleagues and studying last issue of the electronic journals were the most important for searching. The most important strategies were using search engines and scientific tools such as Google Scholar. In addition, utilizing from simple (Quick search method was the most common among subjects. Using of topic, keywords, title of paper were most important of elements for retrieval information. Analysis of interview showed that there were nine stages in researchers’ information searching behaviour: topic selection, initiating search, formulating search query, information retrieval, access to information
Law, Michael R; Mintzes, Barbara; Morgan, Steven G
2011-03-01
The Internet has become a popular source of health information. However, there is little information on what drug information and which Web sites are being searched. To investigate the sources of online information about prescription drugs by assessing the most common Web sites returned in online drug searches and to assess the comparative popularity of Web pages for particular drugs. This was a cross-sectional study of search results for the most commonly dispensed drugs in the US (n=278 active ingredients) on 4 popular search engines: Bing, Google (both US and Canada), and Yahoo. We determined the number of times a Web site appeared as the first result. A linked retrospective analysis counted Wikipedia page hits for each of these drugs in 2008 and 2009. About three quarters of the first result on Google USA for both brand and generic names linked to the National Library of Medicine. In contrast, Wikipedia was the first result for approximately 80% of generic name searches on the other 3 sites. On these other sites, over two thirds of brand name searches led to industry-sponsored sites. The Wikipedia pages with the highest number of hits were mainly for opiates, benzodiazepines, antibiotics, and antidepressants. Wikipedia and the National Library of Medicine rank highly in online drug searches. Further, our results suggest that patients most often seek information on drugs with the potential for dependence, for stigmatized conditions, that have received media attention, and for episodic treatments. Quality improvement efforts should focus on these drugs.
Filistea Naude; Chris Rensleigh; Adeline S.A. du Toit
2010-01-01
This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa) was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The re...
SA-Search: a web tool for protein structure mining based on a Structural Alphabet.
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-07-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search.
Directory of Open Access Journals (Sweden)
Simon Briscoe
2016-09-01
Full Text Available A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001. Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3, 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The
Dore, Kelly L; Reiter, Harold I; Kreuger, Sharyn; Norman, Geoffrey R
2017-05-01
Typically, only a minority of applicants to health professional training are invited to interview. However, pre-interview measures of cognitive skills predict for national licensure scores (Gauer et al. in Med Educ Online 21 2016) and subsequently licensure scores predict for performance in practice (Tamblyn et al. in JAMA 288(23): 3019-3026, 2002; Tamblyn et al. in JAMA 298(9):993-1001, 2007). Assessment of personal and professional characteristics, with the same psychometric rigour of measures of cognitive abilities, are needed upstream in the selection to health profession training programs. To fill that need, Computer-based Assessment for Sampling Personal characteristics (CASPer)-an on-line, video-based screening test-was created. In this paper, we examine the correlation between CASPer and Canadian national licensure examination outcomes in 109 doctors who took CASPer at the time of selection to medical school. Specifically, CASPer scores were correlated against performance on cognitive and 'non-cognitive' subsections of both the Medical Council of Canada Qualifying Examination (MCCQE) Parts I (end of medical school) and Part II (18 months into specialty training). Unlike most national licensure exams, MCCQE has specific subcomponents examining personal/professional qualities, providing a unique opportunity for comparison. The results demonstrated moderate predictive validity of CASPer to national licensure outcomes of personal/professional characteristics three to six years after admission to medical school. These types of disattenuated correlations (r = 0.3-0.5) are not otherwise predicted by traditional screening measures. These data support the ability of a computer-based strategy to screen applicants in a feasible, reliable test, which has now demonstrated predictive validity, lending evidence of its validation for medical school applicant selection.
Omicseq: a web-based search engine for exploring omics datasets.
Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S
2017-07-03
The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Porter, Brandi
2009-01-01
Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…
Zhang, Lu; Du, Hongru; Zhao, Yannan; Wu, Rongwei; Zhang, Xiaolei
2017-01-01
"The Belt and Road" initiative has been expected to facilitate interactions among numerous city centers. This initiative would generate a number of centers, both economic and political, which would facilitate greater interaction. To explore how information flows are merged and the specific opportunities that may be offered, Chinese cities along "the Belt and Road" are selected for a case study. Furthermore, urban networks in cyberspace have been characterized by their infrastructure orientation, which implies that there is a relative dearth of studies focusing on the investigation of urban hierarchies by capturing information flows between Chinese cities along "the Belt and Road". This paper employs Baidu, the main web search engine in China, to examine urban hierarchies. The results show that urban networks become more balanced, shifting from a polycentric to a homogenized pattern. Furthermore, cities in networks tend to have both a hierarchical system and a spatial concentration primarily in regions such as Beijing-Tianjin-Hebei, Yangtze River Delta and the Pearl River Delta region. Urban hierarchy based on web search activity does not follow the existing hierarchical system based on geospatial and economic development in all cases. Moreover, urban networks, under the framework of "the Belt and Road", show several significant corridors and more opportunities for more cities, particularly western cities. Furthermore, factors that may influence web search activity are explored. The results show that web search activity is significantly influenced by the economic gap, geographical proximity and administrative rank of the city.
Directory of Open Access Journals (Sweden)
Lu Zhang
Full Text Available "The Belt and Road" initiative has been expected to facilitate interactions among numerous city centers. This initiative would generate a number of centers, both economic and political, which would facilitate greater interaction. To explore how information flows are merged and the specific opportunities that may be offered, Chinese cities along "the Belt and Road" are selected for a case study. Furthermore, urban networks in cyberspace have been characterized by their infrastructure orientation, which implies that there is a relative dearth of studies focusing on the investigation of urban hierarchies by capturing information flows between Chinese cities along "the Belt and Road". This paper employs Baidu, the main web search engine in China, to examine urban hierarchies. The results show that urban networks become more balanced, shifting from a polycentric to a homogenized pattern. Furthermore, cities in networks tend to have both a hierarchical system and a spatial concentration primarily in regions such as Beijing-Tianjin-Hebei, Yangtze River Delta and the Pearl River Delta region. Urban hierarchy based on web search activity does not follow the existing hierarchical system based on geospatial and economic development in all cases. Moreover, urban networks, under the framework of "the Belt and Road", show several significant corridors and more opportunities for more cities, particularly western cities. Furthermore, factors that may influence web search activity are explored. The results show that web search activity is significantly influenced by the economic gap, geographical proximity and administrative rank of the city.
Searching the Web for Earth Science Data: Semiotics to Cybernetics and Back
Directory of Open Access Journals (Sweden)
Bruce R. Barkstrom
2016-06-01
Full Text Available This paper discusses a search paradigm for numerical data in Earth science that relies on the intrinsic structure of an archive's collection. Such non-textual data lies outside the normal textual basis for the Semantic Web. The paradigm tries to bypass some of the difficulties associated with keyword searches, such as semantic heterogeneity. The suggested collection structure uses a hierarchical taxonomy based on multidimensional axes of continuous variables. This structure fits the underlying 'geometry' of Earth science data better than sets of keywords in an ontology. The alternative paradigm views the search as a two-agent cooperative game that uses a dialog between the search engine and the data user. In this view, the search engine knows about the objects in the archive. It cannot read the user's mind to identify what the user needs. We assume the user has a clear idea of the search target. However he or she may not have a clear idea of the archive's contents. The paper suggests how the user interface may provide information to deal with the user's difficulties in understanding items in the dialog.
SA-Search: a web tool for protein structure mining based on a Structural Alphabet
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-01-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of f...
A Web-based Tool for SDSS and 2MASS Database Searches
Hendrickson, M. A.; Uomoto, A.; Golimowski, D. A.
We have developed a web site using HTML, Php, Python, and MySQL that extracts, processes, and displays data from the Sloan Digital Sky Survey (SDSS) and the Two-Micron All-Sky Survey (2MASS). The goal is to locate brown dwarf candidates in the SDSS database by looking at color cuts; however, this site could also be useful for targeted searches of other databases as well. MySQL databases are created from broad searches of SDSS and 2MASS data. Broad queries on the SDSS and 2MASS database servers are run weekly so that observers have the most up-to-date information from which to select candidates for observation. Observers can look at detailed information about specific objects including finding charts, images, and available spectra. In addition, updates from previous observations can be added by any collaborators; this format makes observational collaboration simple. Observers can also restrict the database search, just before or during an observing run, to select objects of special interest.
Reconsidering the Rhizome: A Textual Analysis of Web Search Engines as Gatekeepers of the Internet
Hess, A.
Critical theorists have often drawn from Deleuze and Guattari's notion of the rhizome when discussing the potential of the Internet. While the Internet may structurally appear as a rhizome, its day-to-day usage by millions via search engines precludes experiencing the random interconnectedness and potential democratizing function. Through a textual analysis of four search engines, I argue that Web searching has grown hierarchies, or "trees," that organize data in tracts of knowledge and place users in marketing niches rather than assist in the development of new knowledge.
Banerji, Anirban; Magarkar, Aniket
2012-09-01
We feel happy when web browsing operations provide us with necessary information; otherwise, we feel bitter. How to measure this happiness (or bitterness)? How does the profile of happiness grow and decay during the course of web browsing? We propose a probabilistic framework that models the evolution of user satisfaction, on top of his/her continuous frustration at not finding the required information. It is found that the cumulative satisfaction profile of a web-searching individual can be modeled effectively as the sum of a random number of random terms, where each term is a mutually independent random variable, originating from ‘memoryless’ Poisson flow. Evolution of satisfaction over the entire time interval of a user’s browsing was modeled using auto-correlation analysis. A utilitarian marker, a magnitude of greater than unity of which describes happy web-searching operations, and an empirical limit that connects user’s satisfaction with his frustration level-are proposed too. The presence of pertinent information in the very first page of a website and magnitude of the decay parameter of user satisfaction (frustration, irritation etc.) are found to be two key aspects that dominate the web user’s psychology. The proposed model employed different combinations of decay parameter, searching time and number of helpful websites. The obtained results are found to match the results from three real-life case studies.
Tracking changes in search behaviour at a health web site.
Eklund, Ann-Marie
2012-01-01
Nowadays, the internet is used as a means to provide the public with official information on many different topics, including health related matters and care providers. In this work we have studied a search log from the official Swedish health web site 1177.se for patterns of search behaviour over time. To improve the analysis, we mapped the queries to UMLS semantic types and MeSH categories. Our analysis shows that, as expected, diseases and health care activities are the ones of most interest, but also a clear increased interest in geographical locations in the setting of health care providers. We also note a change over time in which kinds of diseases are of interest. Finally, we conclude that this type of analysis may be useful in studies of what health related topics matter to the public, but also for design and follow-up of public information campaigns.
A Novel Framework for Medical Web Information Foraging Using Hybrid ACO and Tabu Search.
Drias, Yassine; Kechid, Samir; Pasi, Gabriella
2016-01-01
We present in this paper a novel approach based on multi-agent technology for Web information foraging. We proposed for this purpose an architecture in which we distinguish two important phases. The first one is a learning process for localizing the most relevant pages that might interest the user. This is performed on a fixed instance of the Web. The second takes into account the openness and dynamicity of the Web. It consists on an incremental learning starting from the result of the first phase and reshaping the outcomes taking into account the changes that undergoes the Web. The system was implemented using a colony of artificial ants hybridized with tabu search in order to achieve more effectiveness and efficiency. To validate our proposal, experiments were conducted on MedlinePlus, a real website dedicated for research in the domain of Health in contrast to other previous works where experiments were performed on web logs datasets. The main results are promising either for those related to strong Web regularities and for the response time, which is very short and hence complies the real time constraint.
A geospatial search engine for discovering multi-format geospatial data across the web
Christopher Bone; Alan Ager; Ken Bunzel; Lauren Tierney
2014-01-01
The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created. However, challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist. The objective of this paper is to present a publically...
GeNemo: a search engine for web-based functional genomic data.
Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng
2016-07-08
A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Distributed Web-Scale Infrastructure For Crawling, Indexing And Search With Semantic Support
Directory of Open Access Journals (Sweden)
Stefan Dlugolinsky
2012-01-01
Full Text Available In this paper, we describe our work in progress in the scope of web-scale informationextraction and information retrieval utilizing distributed computing. Wepresent a distributed architecture built on top of the MapReduce paradigm forinformation retrieval, information processing and intelligent search supportedby spatial capabilities. Proposed architecture is focused on crawling documentsin several different formats, information extraction, lightweight semantic annotationof the extracted information, indexing of extracted information andfinally on indexing of documents based on the geo-spatial information foundin a document. We demonstrate the architecture on two use cases, where thefirst is search in job offers retrieved from the LinkedIn portal and the second issearch in BBC news feeds and discuss several problems we had to face duringthe implementation. We also discuss spatial search applications for both casesbecause both LinkedIn job offer pages and BBC news feeds contain a lot of spatialinformation to extract and process.
Patscanui: an intuitive web interface for searching patterns in DNA and protein data
DEFF Research Database (Denmark)
Blin, Kai; Wohlleben, Wolfgang; Weber, Tilmann
2018-01-01
Patterns in biological sequences frequently signify interesting features in the underlying molecule. Many tools exist to search for well-known patterns. Less support is available for exploratory analysis, where no well-defined patterns are known yet. PatScanUI (https://patscan.secondarymetabolite......Patterns in biological sequences frequently signify interesting features in the underlying molecule. Many tools exist to search for well-known patterns. Less support is available for exploratory analysis, where no well-defined patterns are known yet. PatScanUI (https......://patscan.secondarymetabolites.org/) provides a highly interactive web interface to the powerful generic pattern search tool PatScan. The complex PatScan-patterns are created in a drag-and-drop aware interface allowing researchers to do rapid prototyping of the often complicated patterns useful to identifying features of interest....
Sagace: A web-based search engine for biomedical databases in Japan
Directory of Open Access Journals (Sweden)
Morita Mizuki
2012-10-01
Full Text Available Abstract Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data and biological resource banks (such as mouse models of disease and cell lines. With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/.
Myanmar Language Search Engine
Pann Yu Mon; Yoshiki Mikami
2011-01-01
With the enormous growth of the World Wide Web, search engines play a critical role in retrieving information from the borderless Web. Although many search engines are available for the major languages, but they are not much proficient for the less computerized languages including Myanmar. The main reason is that those search engines are not considering the specific features of those languages. A search engine which capable of searching the Web documents written in those languages is highly n...
Semantic similarity measures in the biomedical domain by leveraging a web search engine.
Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching
2013-07-01
Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.
Search of the Deep and Dark Web via DARPA Memex
Mattmann, C. A.
2015-12-01
Search has progressed through several stages due to the increasing size of the Web. Search engines first focused on text and its rate of occurrence; then focused on the notion of link analysis and citation then on interactivity and guided search; and now on the use of social media - who we interact with, what we comment on, and who we follow (and who follows us). The next stage, referred to as "deep search," requires solutions that can bring together text, images, video, importance, interactivity, and social media to solve this challenging problem. The Apache Nutch project provides an open framework for large-scale, targeted, vertical search with capabilities to support all past and potential future search engine foci. Nutch is a flexible infrastructure allowing open access to ranking; URL selection and filtering approaches, to the link graph generated from search, and Nutch has spawned entire sub communities including Apache Hadoop and Apache Tika. It addresses many current needs with the capability to support new technologies such as image and video. On the DARPA Memex project, we are creating create specific extensions to Nutch that will directly improve its overall technological superiority for search and that will directly allow us to address complex search problems including human trafficking. We are integrating state-of-the-art algorithms developed by Kitware for IARPA Aladdin combined with work by Harvard to provide image and video understanding support allowing automatic detection of people and things and massive deployment via Nutch. We are expanding Apache Tika for scene understanding, object/person detection and classification in images/video. We are delivering an interactive and visual interface for initiating Nutch crawls. The interface uses Python technologies to expose Nutch data and to provide a domain specific language for crawls. With the Bokeh visualization library the interface we are delivering simple interactive crawl visualization and
Using Open Web APIs in Teaching Web Mining
Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju
2009-01-01
With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…
Semantic similarity measure in biomedical domain leverage web search engine.
Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei
2010-01-01
Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
Shaffer, Victoria A; Owens, Justin; Zikmund-Fisher, Brian J
2013-12-17
Previous research has examined the impact of patient narratives on treatment choices, but to our knowledge, no study has examined the effect of narratives on information search. Further, no research has considered the relative impact of their format (text vs video) on health care decisions in a single study. Our goal was to examine the impact of video and text-based narratives on information search in a Web-based patient decision aid for early stage breast cancer. Fifty-six women were asked to imagine that they had been diagnosed with early stage breast cancer and needed to choose between two surgical treatments (lumpectomy with radiation or mastectomy). Participants were randomly assigned to view one of four versions of a Web decision aid. Two versions of the decision aid included videos of interviews with patients and physicians or videos of interviews with physicians only. To distinguish between the effect of narratives and the effect of videos, we created two text versions of the Web decision aid by replacing the patient and physician interviews with text transcripts of the videos. Participants could freely browse the Web decision aid until they developed a treatment preference. We recorded participants' eye movements using the Tobii 1750 eye-tracking system equipped with Tobii Studio software. A priori, we defined 24 areas of interest (AOIs) in the Web decision aid. These AOIs were either separate pages of the Web decision aid or sections within a single page covering different content. We used multilevel modeling to examine the effect of narrative presence, narrative format, and their interaction on information search. There was a significant main effect of condition, P=.02; participants viewing decision aids with patient narratives spent more time searching for information than participants viewing the decision aids without narratives. The main effect of format was not significant, P=.10. However, there was a significant condition by format interaction on
HDAPD: a web tool for searching the disease-associated protein structures
2010-01-01
Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919
Characterizing interdisciplinarity of researchers and research topics using web search engines.
Sayama, Hiroki; Akaishi, Jin
2012-01-01
Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher) and at the social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.
Couvin, David; Zozio, Thierry; Rastogi, Nalin
2017-07-01
Spoligotyping is one of the most commonly used polymerase chain reaction (PCR)-based methods for identification and study of genetic diversity of Mycobacterium tuberculosis complex (MTBC). Despite its known limitations if used alone, the methodology is particularly useful when used in combination with other methods such as mycobacterial interspersed repetitive units - variable number of tandem DNA repeats (MIRU-VNTRs). At a worldwide scale, spoligotyping has allowed identification of information on 103,856 MTBC isolates (corresponding to 98049 clustered strains plus 5807 unique isolates from 169 countries of patient origin) contained within the SITVIT2 proprietary database of the Institut Pasteur de la Guadeloupe. The SpolSimilaritySearch web-tool described herein (available at: http://www.pasteur-guadeloupe.fr:8081/SpolSimilaritySearch) incorporates a similarity search algorithm allowing users to get a complete overview of similar spoligotype patterns (with information on presence or absence of 43 spacers) in the aforementioned worldwide database. This tool allows one to analyze spread and evolutionary patterns of MTBC by comparing similar spoligotype patterns, to distinguish between widespread, specific and/or confined patterns, as well as to pinpoint patterns with large deleted blocks, which play an intriguing role in the genetic epidemiology of M. tuberculosis. Finally, the SpolSimilaritySearch tool also provides with the country distribution patterns for each queried spoligotype. Copyright © 2017 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
David Hook
2006-09-01
Full Text Available Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American over a seven‐year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one term queries, operator (and, or, not, etc. usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.‐based or the European‐based search engines. As well, there was little change observed in the percentage of one‐term queries over the years of study for either the U.S.‐based or the European‐based search engines. Few users (generally less than 20% use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.‐based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, oneterm sessions and operator usage remained stable. The increase in single result page viewing
Cardiac Resynchronization Therapy Online: What Patients Find when Searching the World Wide Web.
Modi, Minal; Laskar, Nabila; Modi, Bhavik N
2016-06-01
To objectively assess the quality of information available on the World Wide Web on cardiac resynchronization therapy (CRT). Patients frequently search the internet regarding their healthcare issues. It has been shown that patients seeking information can help or hinder their healthcare outcomes depending on the quality of information consulted. On the internet, this information can be produced and published by anyone, resulting in the risk of patients accessing inaccurate and misleading information. The search term "Cardiac Resynchronisation Therapy" was entered into the three most popular search engines and the first 50 pages on each were pooled and analyzed, after excluding websites inappropriate for objective review. The "LIDA" instrument (a validated tool for assessing quality of healthcare information websites) was to generate scores on Accessibility, Reliability, and Usability. Readability was assessed using the Flesch Reading Ease Score (FRES). Of the 150 web-links, 41 sites met the eligibility criteria. The sites were assessed using the LIDA instrument and the FRES. A mean total LIDA score for all the websites assessed was 123.5 of a possible 165 (74.8%). The average Accessibility of the sites assessed was 50.1 of 60 (84.3%), on Usability 41.4 of 54 (76.6%), on Reliability 31.5 of 51 (61.7%), and 41.8 on FRES. There was a significant variability among sites and interestingly, there was no correlation between the sites' search engine ranking and their scores. This study has illustrated the variable quality of online material on the topic of CRT. Furthermore, there was also no apparent correlation between highly ranked, popular websites and their quality. Healthcare professionals should be encouraged to guide their patients toward the online material that contains reliable information. © 2016 Wiley Periodicals, Inc.
Personalization of Rule-based Web Services.
Choi, Okkyung; Han, Sang Yong
2008-04-04
Nowadays Web users have clearly expressed their wishes to receive personalized services directly. Personalization is the way to tailor services directly to the immediate requirements of the user. However, the current Web Services System does not provide any features supporting this such as consideration of personalization of services and intelligent matchmaking. In this research a flexible, personalized Rule-based Web Services System to address these problems and to enable efficient search, discovery and construction across general Web documents and Semantic Web documents in a Web Services System is proposed. This system utilizes matchmaking among service requesters', service providers' and users' preferences using a Rule-based Search Method, and subsequently ranks search results. A prototype of efficient Web Services search and construction for the suggested system is developed based on the current work.
Intelligent Search Optimization using Artificial Fuzzy Logics
Manral, Jai
2015-01-01
Information on the web is prodigious; searching relevant information is difficult making web users to rely on search engines for finding relevant information on the web. Search engines index and categorize web pages according to their contents using crawlers and rank them accordingly. For given user query they retrieve millions of webpages and display them to users according to web-page rank. Every search engine has their own algorithms based on certain parameters for ranking web-pages. Searc...
Directory of Open Access Journals (Sweden)
Stefano Ferretti
2007-01-01
Full Text Available We are on the threshold of a mediamorphosis that will revolutionize the way we interact with our TV sets. The combination between interactive digital TV (IDTV and the Web fosters the development of new interactive multimedia services enjoyable even through a TV screen and a remote control. Yet, several design constraints complicate the deployment of this new pattern of services. Prominent unresolved issues involve macro-problems such as collecting information on the Web based on users' preferences and appropriately presenting retrieved Web contents on the TV screen. To this aim, we propose a system able to dynamically convey contents from the Web to IDTV systems. Our system presents solutions both for personalized Web content search and automatic TV-format adaptation of retrieved documents. As we demonstrate through two case study applications, our system merges the best of IDTV and Web domains spinning the TV mediamorphosis toward the creation of the personal-TV concept.
A Taxonomic Search Engine: federating taxonomic databases using web services.
Page, Roderic D M
2005-03-09
The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. The Taxonomic Search Engine is available at http://darwin.zoology.gla.ac.uk/~rpage/portal/ and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names.
Federated Search in the Wild: the combined power of over a hundred search engines
Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
2012-01-01
Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for
SearchResultFinder: federated search made easy
Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd
Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up
News trends and web search query of HIV/AIDS in Hong Kong
Chiu, Alice P. Y.; Lin, Qianying
2017-01-01
Background The HIV epidemic in Hong Kong has worsened in recent years, with major contributions from high-risk subgroup of men who have sex with men (MSM). Internet use is prevalent among the majority of the local population, where they sought health information online. This study examines the impacts of HIV/AIDS and MSM news coverage on web search query in Hong Kong. Methods Relevant news coverage about HIV/AIDS and MSM from January 1st, 2004 to December 31st, 2014 was obtained from the WiseNews databse. News trends were created by computing the number of relevant articles by type, topic, place of origin and sub-populations. We then obtained relevant search volumes from Google and analysed causality between news trends and Google Trends using Granger Causality test and orthogonal impulse function. Results We found that editorial news has an impact on “HIV” Google searches on HIV, with the search term popularity peaking at an average of two weeks after the news are published. Similarly, editorial news has an impact on the frequency of “AIDS” searches two weeks after. MSM-related news trends have a more fluctuating impact on “MSM” Google searches, although the time lag varies anywhere from one week later to ten weeks later. Conclusions This infodemiological study shows that there is a positive impact of news trends on the online search behavior of HIV/AIDS or MSM-related issues for up to ten weeks after. Health promotional professionals could make use of this brief time window to tailor the timing of HIV awareness campaigns and public health interventions to maximise its reach and effectiveness. PMID:28922376
INTERFACING GOOGLE SEARCH ENGINE TO CAPTURE USER WEB SEARCH BEHAVIOR
Fadhilah Mat Yamin; T. Ramayah
2013-01-01
The behaviour of the searcher when using the search engine especially during the query formulation is crucial. Search engines capture users’ activities in the search log, which is stored at the search engine server. Due to the difficulty of obtaining this search log, this paper proposed and develops an interface framework to interface a Google search engine. This interface will capture users’ queries before redirect them to Google. The analysis of the search log will show that users are utili...
The Web as Information Source: a Case Study on the Impact of Internet Search Lessons
Directory of Open Access Journals (Sweden)
Chiara Ravagni
2010-09-01
Full Text Available The use of the Web by students has increased more and more and it has become the most recurring way to find quick information for educational purposes. Given the lack, in Italy, of thorough programs for the integration of Information Literacy and Internet searches in schools and universities, the adults who are now using it are almost always self-taught. Consequently, many different approaches to the medium have spread, and with them an objective difficulty in planning Internet-research courses, since everyone has his/her own way to search and a unique perception of his/her search skills. That’s why delivering a course where every participant is forced to follow the same learning path may originate feelings of frustration, unease, or boredom, thus reducing the learning potential offered by the course. This research focuses on the Internet Search side of Information Literacy and analyzes the impact of short lessons on first and second year university students in Education at the University of Bolzano, Italy. The students are either native German-speakers or native Italian-speakers, and the research focuses, in an European perspective, on the differences in their Internet-research approaches as well. The first phase consists in interviews and test (the logs of the internet sessions are recorded by a software to find out the perception of reliability of the Internet information and the way to find it by the students. The second phase is the course in itself, which focuses on Boolean operators, information retrieval theories and exercises, and evaluation of web pages. After the course the students are interviewed and tested again, to check if their approach to internet research has changed and in which way. The results can be used to plan courses on Information Literacy and Internet Search with individualized programs, or to propose methods to assess the learning in this field.
Experience of Developing a Meta-Semantic Search Engine
Mukhopadhyay, Debajyoti; Sharma, Manoj; Joshi, Gajanan; Pagare, Trupti; Palwe, Adarsha
2013-01-01
Thinking of todays web search scenario which is mainly keyword based, leads to the need of effective and meaningful search provided by Semantic Web. Existing search engines are vulnerable to provide relevant answers to users query due to their dependency on simple data available in web pages. On other hand, semantic search engines provide efficient and relevant results as the semantic web manages information with well defined meaning using ontology. A Meta-Search engine is a search tool that ...
Colombo, Cinzia; Mosconi, Paola; Confalonieri, Paolo; Baroni, Isabella; Traversa, Silvia; Hill, Sophie J; Synnot, Anneliese J; Oprandi, Nadia; Filippini, Graziella
2014-07-24
Multiple sclerosis (MS) patients and their family members increasingly seek health information on the Internet. There has been little exploration of how MS patients integrate health information with their needs, preferences, and values for decision making. The INtegrating and Deriving Evidence, Experiences, and Preferences (IN-DEEP) project is a collaboration between Italian and Australian researchers and MS patients, aimed to make high-quality evidence accessible and meaningful to MS patients and families, developing a Web-based resource of evidence-based information starting from their information needs. The objective of this study was to analyze MS patients and their family members' experience about the Web-based health information, to evaluate how they asses this information, and how they integrate health information with personal values. We organized 6 focus groups, 3 with MS patients and 3 with family members, in the Northern, Central, and Southern parts of Italy (April-June 2011). They included 40 MS patients aged between 18 and 60, diagnosed as having MS at least 3 months earlier, and 20 family members aged 18 and over, being relatives of a person with at least a 3-months MS diagnosis. The focus groups were audio-recorded and transcribed verbatim (Atlas software, V 6.0). Data were analyzed from a conceptual point of view through a coding system. An online forum was hosted by the Italian MS society on its Web platform to widen the collection of information. Nine questions were posted covering searching behavior, use of Web-based information, truthfulness of Web information. At the end, posts were downloaded and transcribed. Information needs covered a comprehensive communication of diagnosis, prognosis, and adverse events of treatments, MS causes or risk factors, new drugs, practical, and lifestyle-related information. The Internet is considered useful by MS patients, however, at the beginning or in a later stage of the disease a refusal to actively search
A Taxonomic Search Engine: Federating taxonomic databases using web services
Directory of Open Access Journals (Sweden)
Page Roderic DM
2005-03-01
Full Text Available Abstract Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata for each name. Conclusion The Taxonomic Search Engine is available at http://darwin.zoology.gla.ac.uk/~rpage/portal/ and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur
2013-03-01
Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the
An overview of biomedical literature search on the World Wide Web in the third millennium.
Kumar, Prince; Goel, Roshni; Jain, Chandni; Kumar, Ashish; Parashar, Abhishek; Gond, Ajay Ratan
2012-06-01
Complete access to the existing pool of biomedical literature and the ability to "hit" upon the exact information of the relevant specialty are becoming essential elements of academic and clinical expertise. With the rapid expansion of the literature database, it is almost impossible to keep up to date with every innovation. Using the Internet, however, most people can freely access this literature at any time, from almost anywhere. This paper highlights the use of the Internet in obtaining valuable biomedical research information, which is mostly available from journals, databases, textbooks and e-journals in the form of web pages, text materials, images, and so on. The authors present an overview of web-based resources for biomedical researchers, providing information about Internet search engines (e.g., Google), web-based bibliographic databases (e.g., PubMed, IndMed) and how to use them, and other online biomedical resources that can assist clinicians in reaching well-informed clinical decisions.
Lee, Tae-Kyong; Chung, Hea-Jung; Park, Hye-Kyung; Lee, Eun-Ju; Nam, Hye-Seon; Jung, Soon-Im; Cho, Jee-Ye; Lee, Jin-Hee; Kim, Gon; Kim, Min-Chan
2008-01-01
A diet habit, which is developed in childhood, lasts for a life time. In this sense, nutrition education and early exposure to healthy menus in childhood is important. Children these days have easy access to the internet. Thus, a web-based nutrition education program for children is an effective tool for nutrition education of children. This site provides the material of the nutrition education for children with characters which are personified nutrients. The 151 menus are stored in the site together with video script of the cooking process. The menus are classified by the criteria based on age, menu type and the ethnic origin of the menu. The site provides a search function. There are three kinds of search conditions which are key words, menu type and "between" expression of nutrients such as calorie and other nutrients. The site is developed with the operating system Windows 2003 Server, the web server ZEUS 5, development language JSP, and database management system Oracle 10 g. PMID:20126375
Intelligent Agent Based Semantic Web in Cloud Computing Environment
Mukhopadhyay, Debajyoti; Sharma, Manoj; Joshi, Gajanan; Pagare, Trupti; Palwe, Adarsha
2013-01-01
Considering today's web scenario, there is a need of effective and meaningful search over the web which is provided by Semantic Web. Existing search engines are keyword based. They are vulnerable in answering intelligent queries from the user due to the dependence of their results on information available in web pages. While semantic search engines provides efficient and relevant results as the semantic web is an extension of the current web in which information is given well defined meaning....
Exploring the academic invisible web
Lewandowski, Dirk; Mayr, Philipp
2006-01-01
Purpose: To provide a critical review of Bergman’s 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scientific search engines. We provide an overview of approaches followed thus far. Design/methodol...
Integration of Web mining and web crawler: Relevance and State of Art
Subhendu kumar pani; Deepak Mohapatra,; Bikram Keshari Ratha
2010-01-01
This study presents the role of web crawler in web mining environment. As the growth of the World Wide Web exceeded all expectations,the research on Web mining is growing more and more.web mining research topic which combines two of the activated research areas: Data Mining and World Wide Web .So, the World Wide Web is a very advanced area for data mining research. Search engines that are based on web crawling framework also used in web mining to find theinteracted web pages. This paper discu...
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-02-09
BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform
Directory of Open Access Journals (Sweden)
Zhao Xuechun
2007-02-01
Full Text Available Abstract Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1 query and target sequence database management, (2 automated high-throughput BLAST searching, (3 indexing and searching of results, (4 filtering results online, (5 managing results of personal interest in favorite categories, (6 automated sequence annotation (such as NCBI NR and ontology-based annotation. PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results
U.S. Environmental Protection Agency — EPA's Web Taxonomy is a faceted hierarchical vocabulary used to tag web pages with terms from a controlled vocabulary. Tagging enables search and discovery of EPA's...
How to Search the Internet Archive Without Indexing It
DEFF Research Database (Denmark)
Kanhabua, Nattiya; Kemkes, Philipp; Nejdl, Wolfgang
2016-01-01
Significant parts of our cultural heritage are produced on the Web in recent years. While the easy accessibility to the current Web is a good baseline, optimal access to the past of the Web faces several challenges. This includes dealing with large-scale web archive collections, as well as lacking...... search results to the WayBack Machine; thus al- lowing keyword search on the Internet Archive without processing and indexing its raw content. Our system complements existing web archive search tools through a user interface, which comes close to the functionalities of modern web search engines (e...
A Survey On Various Web Template Detection And Extraction Methods
Directory of Open Access Journals (Sweden)
Neethu Mary Varghese
2015-03-01
Full Text Available Abstract In todays digital world reliance on the World Wide Web as a source of information is extensive. Users increasingly rely on web based search engines to provide accurate search results on a wide range of topics that interest them. The search engines in turn parse the vast repository of web pages searching for relevant information. However majority of web portals are designed using web templates which are designed to provide consistent look and feel to end users. The presence of these templates however can influence search results leading to inaccurate results being delivered to the users. Therefore to improve the accuracy and reliability of search results identification and removal of web templates from the actual content is essential. A wide range of approaches are commonly employed to achieve this and this paper focuses on the study of the various approaches of template detection and extraction that can be applied across homogenous as well as heterogeneous web pages.
Allen, J W; Finch, R J; Coleman, M G; Nathanson, L K; O'Rourke, N A; Fielding, G A
2002-01-01
This study was undertaken to determine the quality of information on the Internet regarding laparoscopy. Four popular World Wide Web search engines were used with the key word "laparoscopy." Advertisements, patient- or physician-directed information, and controversial material were noted. A total of 14,030 Web pages were found, but only 104 were unique Web sites. The majority of the sites were duplicate pages, subpages within a main Web page, or dead links. Twenty-eight of the 104 pages had a medical product for sale, 26 were patient-directed, 23 were written by a physician or group of physicians, and six represented corporations. The remaining 21 were "miscellaneous." The 46 pages containing educational material were critically reviewed. At least one of the senior authors found that 32 of the pages contained controversial or misleading statements. All of the three senior authors (LKN, NAO, GAF) independently agreed that 17 of the 46 pages contained controversial information. The World Wide Web is not a reliable source for patient or physician information about laparoscopy. Authenticating medical information on the World Wide Web is a difficult task, and no government or surgical society has taken the lead in regulating what is presented as fact on the World Wide Web.
Albeke, S. E.; Perkins, D. G.; Ewers, S. L.; Ewers, B. E.; Holbrook, W. S.; Miller, S. N.
2015-12-01
The sharing of data and results is paramount for advancing scientific research. The Wyoming Center for Environmental Hydrology and Geophysics (WyCEHG) is a multidisciplinary group that is driving scientific breakthroughs to help manage water resources in the Western United States. WyCEHG is mandated by the National Science Foundation (NSF) to share their data. However, the infrastructure from which to share such diverse, complex and massive amounts of data did not exist within the University of Wyoming. We developed an innovative framework to meet the data organization, sharing, and discovery requirements of WyCEHG by integrating both open and closed source software, embedded metadata tags, semantic web technologies, and a web-mapping application. The infrastructure uses a Relational Database Management System as the foundation, providing a versatile platform to store, organize, and query myriad datasets, taking advantage of both structured and unstructured formats. Detailed metadata are fundamental to the utility of datasets. We tag data with Uniform Resource Identifiers (URI's) to specify concepts with formal descriptions (i.e. semantic ontologies), thus allowing users the ability to search metadata based on the intended context rather than conventional keyword searches. Additionally, WyCEHG data are geographically referenced. Using the ArcGIS API for Javascript, we developed a web mapping application leveraging database-linked spatial data services, providing a means to visualize and spatially query available data in an intuitive map environment. Using server-side scripting (PHP), the mapping application, in conjunction with semantic search modules, dynamically communicates with the database and file system, providing access to available datasets. Our approach provides a flexible, comprehensive infrastructure from which to store and serve WyCEHG's highly diverse research-based data. This framework has not only allowed WyCEHG to meet its data stewardship
Ertl, P
1998-02-01
Easy to use, interactive, and platform-independent WWW-based tools are ideal for development of chemical applications. By using the newly emerging Web technologies such as Java applets and sophisticated scripting, it is possible to deliver powerful molecular processing capabilities directly to the desk of synthetic organic chemists. In Novartis Crop Protection in Basel, a Web-based molecular modelling system has been in use since 1995. In this article two new modules of this system are presented: a program for interactive calculation of important hydrophobic, electronic, and steric properties of organic substituents, and a module for substituent similarity searches enabling the identification of bioisosteric functional groups. Various possible applications of calculated substituent parameters are also discussed, including automatic design of molecules with the desired properties and creation of targeted virtual combinatorial libraries.
FindZebra: A search engine for rare diseases
DEFF Research Database (Denmark)
Dragusin, Radu; Petcu, Paula; Lioma, Christina Amalia
2013-01-01
Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic...... approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, state-of-the-art evaluation measures, and curated information resources. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source...... medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Conclusions: Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular web search engines. The proposed...
Characteristics of scientific web publications
DEFF Research Database (Denmark)
Thorlund Jepsen, Erik; Seiden, Piet; Ingwersen, Peter Emil Rerup
2004-01-01
were generated based on specifically selected domain topics that are searched for in three publicly accessible search engines (Google, AllTheWeb, and AltaVista). A sample of the retrieved hits was analyzed with regard to how various publication attributes correlated with the scientific quality...... of the content and whether this information could be employed to harvest, filter, and rank Web publications. The attributes analyzed were inlinks, outlinks, bibliographic references, file format, language, search engine overlap, structural position (according to site structure), and the occurrence of various...... types of metadata. As could be expected, the ranked output differs between the three search engines. Apparently, this is caused by differences in ranking algorithms rather than the databases themselves. In fact, because scientific Web content in this subject domain receives few inlinks, both Alta...
Gupta, Amardeep
2005-01-01
Current search engines--even the constantly surprising Google--seem unable to leap the next big barrier in search: the trillions of bytes of dynamically generated data created by individual web sites around the world, or what some researchers call the "deep web." The challenge now is not information overload, but information overlook.…
Finding Web-Based Anxiety Interventions on the World Wide Web: A Scoping Review.
Ashford, Miriam Thiel; Olander, Ellinor K; Ayers, Susan
2016-06-01
One relatively new and increasingly popular approach of increasing access to treatment is Web-based intervention programs. The advantage of Web-based approaches is the accessibility, affordability, and anonymity of potentially evidence-based treatment. Despite much research evidence on the effectiveness of Web-based interventions for anxiety found in the literature, little is known about what is publically available for potential consumers on the Web. Our aim was to explore what a consumer searching the Web for Web-based intervention options for anxiety-related issues might find. The objectives were to identify currently publically available Web-based intervention programs for anxiety and to synthesize and review these in terms of (1) website characteristics such as credibility and accessibility; (2) intervention program characteristics such as intervention focus, design, and presentation modes; (3) therapeutic elements employed; and (4) published evidence of efficacy. Web keyword searches were carried out on three major search engines (Google, Bing, and Yahoo-UK platforms). For each search, the first 25 hyperlinks were screened for eligible programs. Included were programs that were designed for anxiety symptoms, currently publically accessible on the Web, had an online component, a structured treatment plan, and were available in English. Data were extracted for website characteristics, program characteristics, therapeutic characteristics, as well as empirical evidence. Programs were also evaluated using a 16-point rating tool. The search resulted in 34 programs that were eligible for review. A wide variety of programs for anxiety, including specific anxiety disorders, and anxiety in combination with stress, depression, or anger were identified and based predominantly on cognitive behavioral therapy techniques. The majority of websites were rated as credible, secure, and free of advertisement. The majority required users to register and/or to pay a program access
Searching for Suicide Information on Web Search Engines in Chinese
Directory of Open Access Journals (Sweden)
Yen-Feng Lee
2017-01-01
Full Text Available Introduction: Recently, suicide prevention has been an important public health issue. However, with the growing access to information in cyberspace, the harmful information is easily accessible online. To investigate the accessibility of potentially harmful suicide-related information on the internet, we discuss the following issue about searching suicide information on the internet to draw attention to it. Methods: We use five search engines (Google, Yahoo, Bing, Yam, and Sina and four suicide-related search queries (suicide, how to suicide, suicide methods, and want to die in traditional Chinese in April 2016. We classified the first thirty linkages of the search results on each search engine by a psychiatric doctor into suicide prevention, pro-suicide, neutral, unrelated to suicide, or error websites. Results: Among the total 352 unique websites generated, the suicide prevention websites were the most frequent among the search results (37.8%, followed by websites unrelated to suicide (25.9% and neutral websites (23.0%. However, pro-suicide websites were still easily accessible (9.7%. Besides, compared with the USA and China, the search engine originating in Taiwan had the lowest accessibility to pro-suicide information. The results of ANOVA showed a significant difference between the groups, F = 8.772, P < 0.001. Conclusions: This study results suggest a need for further restrictions and regulations of pro-suicide information on the internet. Providing more supportive information online may be an effective plan for suicidal prevention.
Search engines that learn from their users
Schuth, A.G.
2016-01-01
More than half the world’s population uses web search engines, resulting in over half a billion search queries every single day. For many people web search engines are among the first resources they go to when a question arises. Moreover, search engines have for many become the most trusted route to
Open meta-search with OpenSearch: a case study
O'Riordan, Adrian P.
2007-01-01
The goal of this project was to demonstrate the possibilities of open source search engine and aggregation technology in a Web environment by building a meta-search engine which employs free open search engines and open protocols. In contrast many meta-search engines on the Internet use proprietary search systems. The search engines employed in this case study are all based on the OpenSearch protocol. OpenSearch-compliant systems support XML technologies such as RSS and Atom for aggregation a...
Harvesting and Organizing Knowledge from the Web
Weikum, Gerhard
2007-01-01
Information organization and search on the {W}eb is gaining structure and context awareness and more semantic flavor, for example, in the forms of faceted search, vertical search, entity search, and {D}eep-{W}eb search. I envision another big leap forward by automatically harvesting and organizing knowledge from the {W}eb, represented in terms of explicit entities and relations as well as ontological concepts. This will be made possible by the confluence of three stron...
FirstSearch and NetFirst--Web and Dial-up Access: Plus Ca Change, Plus C'est la Meme Chose?
Koehler, Wallace; Mincey, Danielle
1996-01-01
Compares and evaluates the differences between OCLC's dial-up and World Wide Web FirstSearch access methods and their interfaces with the underlying databases. Also examines NetFirst, OCLC's new Internet catalog, the only Internet tracking database from a "traditional" database service. (Author/PEN)
Griffin, Teresa; Cohen, Deb
2012-01-01
The ubiquity and familiarity of the world wide web means that students regularly turn to it as a source of information. In doing so, they "are said to rely heavily on simple search engines, such as Google to find what they want." Researchers have also investigated how students use search engines, concluding that "the young web users tended to…
Turner, Laura
2001-01-01
Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…
Estimating Search Engine Index Size Variability
DEFF Research Database (Denmark)
Van den Bosch, Antal; Bogers, Toine; De Kunder, Maurice
2016-01-01
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel...... method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006...... until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find...
Hepp, Martin
E-Commerce on the basis of current Web technology has created fierce competition with a strong focus on price. Despite a huge variety of offerings and diversity in the individual preferences of consumers, current Web search fosters a very early reduction of the search space to just a few commodity makes and models. As soon as this reduction has taken place, search is reduced to flat price comparison. This is unfortunate for the manufacturers and vendors, because their individual value proposition for a particular customer may get lost in the course of communication over the Web, and it is unfortunate for the customer, because he/she may not get the most utility for the money based on her/his preference function. A key limitation is that consumers cannot search using a consolidated view on all alternative offers across the Web. In this talk, I will (1) analyze the technical effects of products and services search on the Web that cause this mismatch between supply and demand, (2) evaluate how the GoodRelations vocabulary and the current Web of Data movement can improve the situation, (3) give a brief hands-on demonstration, and (4) sketch business models for the various market participants.
Surfing the World Wide Web to Education Hot-Spots.
Dyrli, Odvard Egil
1995-01-01
Provides a brief explanation of Web browsers and their use, as well as technical information for those considering access to the WWW (World Wide Web). Curriculum resources and addresses to useful Web sites are included. Sidebars show sample searches using Yahoo and Lycos search engines, and a list of recommended Web resources. (JKP)
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain
2016-01-01
Background Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, Peffect of task (F 3,57=6.3, Pinterface (F 1,19=18.0, Peffect of task (F 2,38=4.1, P=.025, Greenhouse
Making Statistical Data More Easily Accessible on the Web Results of the StatSearch Case Study
Rajman, M; Boynton, I M; Fridlund, B; Fyhrlund, A; Sundgren, B; Lundquist, P; Thelander, H; Wänerskär, M
2005-01-01
In this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navigation techniques exploiting the hierarchical structuring of the available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical information. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distributions over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several appro...
Evaluating aggregated search using interleaving
Chuklin, A.; Schuth, A.; Hofmann, K.; Serdyukov, P.; de Rijke, M.
2013-01-01
A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a
Ensemble learned vaccination uptake prediction using web search queries
DEFF Research Database (Denmark)
Hansen, Niels Dalum; Lioma, Christina; Mølbak, Kåre
2016-01-01
We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. The clinical data is official vaccination registries, and the web data is query frequencies collected from Google Trends. Experiments with official...... vaccine records show that our method predicts vaccination uptake eff?ectively (4.7 Root Mean Squared Error). Whereas performance is best when combining clinical and web data, using solely web data yields comparative performance. To our knowledge, this is the ?first study to predict vaccination uptake...
Information Retrieval for Education: Making Search Engines Language Aware
Ott, Niels; Meurers, Detmar
2010-01-01
Search engines have been a major factor in making the web the successful and widely used information source it is today. Generally speaking, they make it possible to retrieve web pages on a topic specified by the keywords entered by the user. Yet web searching currently does not take into account which of the search results are comprehensible for…
Ontology-Driven Search and Triage: Design of a Web-Based Visual Interface for MEDLINE.
Demelo, Jonathan; Parsons, Paul; Sedig, Kamran
2017-02-02
Diverse users need to search health and medical literature to satisfy open-ended goals such as making evidence-based decisions and updating their knowledge. However, doing so is challenging due to at least two major difficulties: (1) articulating information needs using accurate vocabulary and (2) dealing with large document sets returned from searches. Common search interfaces such as PubMed do not provide adequate support for exploratory search tasks. Our objective was to improve support for exploratory search tasks by combining two strategies in the design of an interactive visual interface by (1) using a formal ontology to help users build domain-specific knowledge and vocabulary and (2) providing multi-stage triaging support to help mitigate the information overload problem. We developed a Web-based tool, Ontology-Driven Visual Search and Triage Interface for MEDLINE (OVERT-MED), to test our design ideas. We implemented a custom searchable index of MEDLINE, which comprises approximately 25 million document citations. We chose a popular biomedical ontology, the Human Phenotype Ontology (HPO), to test our solution to the vocabulary problem. We implemented multistage triaging support in OVERT-MED, with the aid of interactive visualization techniques, to help users deal with large document sets returned from searches. Formative evaluation suggests that the design features in OVERT-MED are helpful in addressing the two major difficulties described above. Using a formal ontology seems to help users articulate their information needs with more accurate vocabulary. In addition, multistage triaging combined with interactive visualizations shows promise in mitigating the information overload problem. Our strategies appear to be valuable in addressing the two major problems in exploratory search. Although we tested OVERT-MED with a particular ontology and document collection, we anticipate that our strategies can be transferred successfully to other contexts.
Raeder, Aggi
1997-01-01
Discussion of ways to promote sites on the World Wide Web focuses on how search engines work and how they retrieve and identify sites. Appropriate Web links for submitting new sites and for Internet marketing are included. (LRW)
An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
Devi, R. Suganya; Manjula, D.; Siddharth, R. K.
2015-01-01
Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling. PMID:26137592
Electronic biomedical literature search for budding researcher.
Thakre, Subhash B; Thakre S, Sushama S; Thakre, Amol D
2013-09-01
Search for specific and well defined literature related to subject of interest is the foremost step in research. When we are familiar with topic or subject then we can frame appropriate research question. Appropriate research question is the basis for study objectives and hypothesis. The Internet provides a quick access to an overabundance of the medical literature, in the form of primary, secondary and tertiary literature. It is accessible through journals, databases, dictionaries, textbooks, indexes, and e-journals, thereby allowing access to more varied, individualised, and systematic educational opportunities. Web search engine is a tool designed to search for information on the World Wide Web, which may be in the form of web pages, images, information, and other types of files. Search engines for internet-based search of medical literature include Google, Google scholar, Scirus, Yahoo search engine, etc., and databases include MEDLINE, PubMed, MEDLARS, etc. Several web-libraries (National library Medicine, Cochrane, Web of Science, Medical matrix, Emory libraries) have been developed as meta-sites, providing useful links to health resources globally. A researcher must keep in mind the strengths and limitations of a particular search engine/database while searching for a particular type of data. Knowledge about types of literature, levels of evidence, and detail about features of search engine as available, user interface, ease of access, reputable content, and period of time covered allow their optimal use and maximal utility in the field of medicine. Literature search is a dynamic and interactive process; there is no one way to conduct a search and there are many variables involved. It is suggested that a systematic search of literature that uses available electronic resource effectively, is more likely to produce quality research.
IntegromeDB: an integrated system and biological search engine.
Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia
2012-01-19
With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.
Extracting Macroscopic Information from Web Links.
Thelwall, Mike
2001-01-01
Discussion of Web-based link analysis focuses on an evaluation of Ingversen's proposed external Web Impact Factor for the original use of the Web, namely the interlinking of academic research. Studies relationships between academic hyperlinks and research activities for British universities and discusses the use of search engines for Web link…
Directory of Open Access Journals (Sweden)
CLAUDIA ELENA DINUCĂ
2011-01-01
Full Text Available The World Wide Web became one of the most valuable resources for information retrievals and knowledge discoveries due to the permanent increasing of the amount of data available online. Taking into consideration the web dimension, the users get easily lost in the web’s rich hyper structure. Application of data mining methods is the right solution for knowledge discovery on the Web. The knowledge extracted from the Web can be used to raise the performances for Web information retrievals, question answering and Web based data warehousing. In this paper, I provide an introduction of Web mining categories and I focus on one of these categories: the Web structure mining. Web structure mining, one of three categories of web mining for data, is a tool used to identify the relationship between Web pages linked by information or direct link connection. It offers information about how different pages are linked together to form this huge web. Web Structure Mining finds hidden basic structures and uses hyperlinks for more web applications such as web search.
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search.
Jay, Caroline; Harper, Simon; Dunlop, Ian; Smith, Sam; Sufi, Shoaib; Goble, Carole; Buchan, Iain
2016-01-14
Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these "experts." Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the "Google generation" than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is "Google-like," enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F1,19=37.3, Pnatural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance
Improving Web Search for Difficult Queries
Wang, Xuanhui
2009-01-01
Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
Survey of Techniques for Deep Web Source Selection and Surfacing the Hidden Web Content
Khushboo Khurana; M.B. Chandak
2016-01-01
Large and continuously growing dynamic web content has created new opportunities for large-scale data analysis in the recent years. There is huge amount of information that the traditional web crawlers cannot access, since they use link analysis technique by which only the surface web can be accessed. Traditional search engine crawlers require the web pages to be linked to other pages via hyperlinks causing large amount of web data to be hidden from the crawlers. Enormous data is available in...
FindZebra: a search engine for rare diseases.
Dragusin, Radu; Petcu, Paula; Lioma, Christina; Larsen, Birger; Jørgensen, Henrik L; Cox, Ingemar J; Hansen, Lars Kai; Ingwersen, Peter; Winther, Ole
2013-06-01
The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface to this information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) diseases represent an especially challenging and thus interesting class to diagnose as each is rare, diverse in symptoms and usually has scattered resources associated with it. We design an evaluation approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, performance measures, information resources and guidelines for customising Google Search to this task. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source search technology and uses curated freely available online medical information. FindZebra outperforms Google Search in both default set-up and customised to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular standard web search. The proposed evaluation approach can be valuable for future development and benchmarking. The FindZebra search engine is available at http://www.findzebra.com/. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
種市, 淳子; 逸村, 裕; TANEICHI, Junko; ITSUMURA, Hiroshi
2005-01-01
In this study, we discussed information seeking behavior on the Web. First, the currentWeb-searching studies are reviewed from the perspective of: (1) Web-searching characteristics; (2) the process model for how users evaluate Web resources. Secondly, we investigated information seeking processes using the Web search engine and online public access catalogue (OPAC) system by undergraduate students, through an experiment and its protocol analysis. The results indicate that: (1) Web-searching p...
Sowpati, Divya Tej; Srivastava, Surabhi; Dhawan, Jyotsna; Mishra, Rakesh K
2017-09-13
Comparative epigenomic analysis across multiple genes presents a bottleneck for bench biologists working with NGS data. Despite the development of standardized peak analysis algorithms, the identification of novel epigenetic patterns and their visualization across gene subsets remains a challenge. We developed a fast and interactive web app, C-State (Chromatin-State), to query and plot chromatin landscapes across multiple loci and cell types. C-State has an interactive, JavaScript-based graphical user interface and runs locally in modern web browsers that are pre-installed on all computers, thus eliminating the need for cumbersome data transfer, pre-processing and prior programming knowledge. C-State is unique in its ability to extract and analyze multi-gene epigenetic information. It allows for powerful GUI-based pattern searching and visualization. We include a case study to demonstrate its potential for identifying user-defined epigenetic trends in context of gene expression profiles.
Incorporating the surfing behavior of web users into PageRank
Ashyralyyev, Shatlyk
2013-01-01
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2013. Thesis (Master's) -- Bilkent University, 2013. Includes bibliographical references leaves 68-73 One of the most crucial factors that determines the effectiveness of a large-scale commercial web search engine is the ranking (i.e., order) in which web search results are presented to the end user. In modern web search engines, the skeleton for the rank...
... topic data in XML format. Using the Web service, software developers can build applications that utilize MedlinePlus health topic information. The service accepts keyword searches as requests and returns relevant ...
A Longitudinal Analysis of Search Engine Index Size
DEFF Research Database (Denmark)
Van den Bosch, Antal; Bogers, Toine; De Kunder, Maurice
2015-01-01
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel...... method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indexes over a nine-year period, from March 2006...... until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find...
Publicizing Your Web Resources for Maximum Exposure.
Smith, Kerry J.
2001-01-01
Offers advice to librarians for marketing their Web sites on Internet search engines. Advises against relying solely on spiders and recommends adding metadata to the source code and delivering that information directly to the search engines. Gives an overview of metadata and typical coding for meta tags. Includes Web addresses for a number of…
Search 3.0: Present, Personal, Precise
Spivack, Nova
The next generation of Web search is already beginning to emerge. With it we will see several shifts in the way people search, and the way major search engines provide search functionality to consumers.
Davis, Harold
2006-01-01
SEO--short for Search Engine Optimization--is the art, craft, and science of driving web traffic to web sites. Web traffic is food, drink, and oxygen--in short, life itself--to any web-based business. Whether your web site depends on broad, general traffic, or high-quality, targeted traffic, this PDF has the tools and information you need to draw more traffic to your site. You'll learn how to effectively use PageRank (and Google itself); how to get listed, get links, and get syndicated; and much more. The field of SEO is expanding into all the possible ways of promoting web traffic. This
Geçer, Aynur Kolburan
2014-01-01
This study addresses university students' information search and commitment strategies on web environment and internet usage self-efficacy beliefs in terms of such variables as gender, department, grade level and frequency of internet use; and whether there is a significant relation between these beliefs. Descriptive method was used in the study.…
A web-based approach to data imputation
Li, Zhixu; Sharaf, Mohamed Abdel Fattah; Sitbon, Laurianne; Sadiq, Shazia Wasim; Indulska, Marta; Zhou, Xiaofang
2013-01-01
principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme
[Development of domain specific search engines].
Takai, T; Tokunaga, M; Maeda, K; Kaminuma, T
2000-01-01
As cyber space exploding in a pace that nobody has ever imagined, it becomes very important to search cyber space efficiently and effectively. One solution to this problem is search engines. Already a lot of commercial search engines have been put on the market. However these search engines respond with such cumbersome results that domain specific experts can not tolerate. Using a dedicate hardware and a commercial software called OpenText, we have tried to develop several domain specific search engines. These engines are for our institute's Web contents, drugs, chemical safety, endocrine disruptors, and emergent response for chemical hazard. These engines have been on our Web site for testing.
Professional and Regulatory Search
Professional and Regulatory search are designed for people who use EPA web resources to do their job. You will be searching collections where information that is not relevant to Environmental and Regulatory professionals.
The Semantic Web: opportunities and challenges for next-generation Web applications
Directory of Open Access Journals (Sweden)
2002-01-01
Full Text Available Recently there has been a growing interest in the investigation and development of the next generation web - the Semantic Web. While most of the current forms of web content are designed to be presented to humans, but are barely understandable by computers, the content of the Semantic Web is structured in a semantic way so that it is meaningful to computers as well as to humans. In this paper, we report a survey of recent research on the Semantic Web. In particular, we present the opportunities that this revolution will bring to us: web-services, agent-based distributed computing, semantics-based web search engines, and semantics-based digital libraries. We also discuss the technical and cultural challenges of realizing the Semantic Web: the development of ontologies, formal semantics of Semantic Web languages, and trust and proof models. We hope that this will shed some light on the direction of future work on this field.
OpenSearch technology for geospatial resources discovery
Papeschi, Fabrizio; Enrico, Boldrini; Mazzetti, Paolo
2010-05-01
In 2005, the term Web 2.0 has been coined by Tim O'Reilly to describe a quickly growing set of Web-based applications that share a common philosophy of "mutually maximizing collective intelligence and added value for each participant by formalized and dynamic information sharing". Around this same period, OpenSearch a new Web 2.0 technology, was developed. More properly, OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format. Due to its strong impact on the way the Web is perceived by users and also due its relevance for businesses, Web 2.0 has attracted the attention of both mass media and the scientific community. This explosive growth in popularity of Web 2.0 technologies like OpenSearch, and practical applications of Service Oriented Architecture (SOA) resulted in an increased interest in similarities, convergence, and a potential synergy of these two concepts. SOA is considered as the philosophy of encapsulating application logic in services with a uniformly defined interface and making these publicly available via discovery mechanisms. Service consumers may then retrieve these services, compose and use them according to their current needs. A great degree of similarity between SOA and Web 2.0 may be leading to a convergence between the two paradigms. They also expose divergent elements, such as the Web 2.0 support to the human interaction in opposition to the typical SOA machine-to-machine interaction. According to these considerations, the Geospatial Information (GI) domain, is also moving first steps towards a new approach of data publishing and discovering, in particular taking advantage of the OpenSearch technology. A specific GI niche is represented by the OGC Catalog Service for Web (CSW) that is part of the OGC Web Services (OWS) specifications suite, which provides a
Tsikrika, T.; Vrochidis, S.; Akhgar, B.; Burnap, P.; Katos, Vasilis; Williams, M.L.
2017-01-01
The deliberate misuse of technical infrastructure (including the Web and social media) for cyber deviant and cybercriminal behaviour, ranging from the spreading of extremist and terrorism-related material to online fraud and cyber security attacks, is on the rise. This workshop aims to better understand such phenomena and develop methods for tackling them in an effective and efficient manner. The workshop brings together interdisciplinary researchers and experts in Web search, security inform...
The Importance of Prior Probabilities for Entry Page Search
Kraaij, W.; Westerveld, T.H.W.; Hiemstra, Djoerd
An important class of searches on the world-wide-web has the goal to find an entry page (homepage) of an organisation. Entry page search is quite different from Ad Hoc search. Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content features of web pages: page length,
Web-based surveillance of public information needs for informing preconception interventions.
Directory of Open Access Journals (Sweden)
Angelo D'Ambrosio
Full Text Available The risk of adverse pregnancy outcomes can be minimized through the adoption of healthy lifestyles before pregnancy by women of childbearing age. Initiatives for promotion of preconception health may be difficult to implement. Internet can be used to build tailored health interventions through identification of the public's information needs. To this aim, we developed a semi-automatic web-based system for monitoring Google searches, web pages and activity on social networks, regarding preconception health.Based on the American College of Obstetricians and Gynecologists guidelines and on the actual search behaviors of Italian Internet users, we defined a set of keywords targeting preconception care topics. Using these keywords, we analyzed the usage of Google search engine and identified web pages containing preconception care recommendations. We also monitored how the selected web pages were shared on social networks. We analyzed discrepancies between searched and published information and the sharing pattern of the topics.We identified 1,807 Google search queries which generated a total of 1,995,030 searches during the study period. Less than 10% of the reviewed pages contained preconception care information and in 42.8% information was consistent with ACOG guidelines. Facebook was the most used social network for sharing. Nutrition, Chronic Diseases and Infectious Diseases were the most published and searched topics. Regarding Genetic Risk and Folic Acid, a high search volume was not associated to a high web page production, while Medication pages were more frequently published than searched. Vaccinations elicited high sharing although web page production was low; this effect was quite variable in time.Our study represent a resource to prioritize communication on specific topics on the web, to address misconceptions, and to tailor interventions to specific populations.
Ocean Drilling Program: Janus Web Database
JANUS Database Send questions/comments about the online database Request data not available online Janus database Search the ODP/TAMU web site ODP's main web site Janus Data Model Data Migration Overview in Janus Data Types and Examples Leg 199, sunrise. Janus Web Database ODP and IODP data are stored in
GeoSearcher: Location-Based Ranking of Search Engine Results.
Watters, Carolyn; Amoudi, Ghada
2003-01-01
Discussion of Web queries with geospatial dimensions focuses on an algorithm that assigns location coordinates dynamically to Web sites based on the URL. Describes a prototype search system that uses the algorithm to re-rank search engine results for queries with a geospatial dimension, thus providing an alternative ranking order for search engine…
Paro, Alberto
2013-01-01
Written in an engaging, easy-to-follow style, the recipes will help you to extend the capabilities of ElasticSearch to manage your data effectively.If you are a developer who implements ElasticSearch in your web applications, manage data, or have decided to start using ElasticSearch, this book is ideal for you. This book assumes that you've got working knowledge of JSON and Java
Millennial Undergraduate Research Strategies in Web and Library Information Retrieval Systems
Porter, Brandi
2011-01-01
This article summarizes the author's dissertation regarding search strategies of millennial undergraduate students in Web and library online information retrieval systems. Millennials bring a unique set of search characteristics and strategies to their research since they have never known a world without the Web. Through the use of search engines,…
Teknik Perangkingan Meta-search Engine
Puspitaningrum, Diyah
2014-01-01
Meta-search engine mengorganisasikan penyatuan hasil dari berbagai search engine dengan tujuan untuk meningkatkan presisi hasil pencarian dokumen web. Pada survei teknik perangkingan meta-search engine ini akan didiskusikan isu-isu pra-pemrosesan, rangking, dan berbagai teknik penggabungan hasil pencarian dari search engine yang berbeda-beda (multi-kombinasi). Isu-isu implementasi penggabungan 2 search engine dan 3 search engine juga menjadi sorotan. Pada makalah ini juga dibahas arahan penel...
Alderdice, Fiona; Gargan, Phyl; McCall, Emma; Franck, Linda
2018-01-30
Online resources are a source of information for parents of premature babies when their baby is discharged from hospital. To explore what topics parents deemed important after returning home from hospital with their premature baby and to evaluate the quality of existing websites that provide information for parents post-discharge. In stage 1, 23 parents living in Northern Ireland participated in three focus groups and shared their information and support needs following the discharge of their infant(s). In stage 2, a World Wide Web (WWW) search was conducted using Google, Yahoo and Bing search engines. Websites meeting pre-specified inclusion criteria were reviewed using two website assessment tools and by calculating a readability score. Website content was compared to the topics identified by parents in the focus groups. Five overarching topics were identified across the three focus groups: life at home after neonatal care, taking care of our family, taking care of our premature baby, baby's growth and development and help with getting support and advice. Twenty-nine sites were identified that met the systematic web search inclusion criteria. Fifteen (52%) covered all five topics identified by parents to some extent and 9 (31%) provided current, accurate and relevant information based on the assessment criteria. Parents reported the need for information and support post-discharge from hospital. This was not always available to them, and relevant online resources were of varying quality. Listening to parents needs and preferences can facilitate the development of high-quality, evidence-based, parent-centred resources. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.
Google and the culture of search
Hillis, Ken; Jarrett, Kylie
2013-01-01
What did you do before Google? The rise of Google as the dominant Internet search provider reflects a generationally-inflected notion that everything that matters is now on the Web, and should, in the moral sense of the verb, be accessible through search. In this theoretically nuanced study of search technology's broader implications for knowledge production and social relations, the authors shed light on a culture of search in which our increasing reliance on search engines influences not only the way we navigate, classify, and evaluate Web content, but also how we think about ourselves and the world around us, online and off. Ken Hillis, Michael Petit, and Kylie Jarrett seek to understand the ascendancy of search and its naturalization by historicizing and contextualizing Google's dominance of the search industry, and suggest that the contemporary culture of search is inextricably bound up with a metaphysical longing to manage, order, and categorize all knowledge. Calling upon this nexus between political e...
Exposing the Hidden-Web Induced by Ajax
Mesbah, A.; Van Deursen, A.
2008-01-01
AJAX is a very promising approach for improving rich interactivity and responsiveness of web applications. At the same time, AJAX techniques increase the totality of the hidden web by shattering the metaphor of a web ‘page’ upon which general search engines are based. This paper describes a
Undergraduate Students' Information Search Practices
Nikolopoulou, Kleopatra; Gialamas, Vasilis
2011-01-01
This paper investigates undergraduate students' information search practices. The subjects were 250 undergraduate students from two university departments in Greece, and a questionnaire was used to document their search practices. The results showed that the Web was the primary information system searched in order to find information for…
Mayer, Miguel A; Karampiperis, Pythagoras; Kukurikos, Antonis; Karkaletsis, Vangelis; Stamatakis, Kostas; Villarroel, Dagmar; Leis, Angela
2011-06-01
The number of health-related websites is increasing day-by-day; however, their quality is variable and difficult to assess. Various "trust marks" and filtering portals have been created in order to assist consumers in retrieving quality medical information. Consumers are using search engines as the main tool to get health information; however, the major problem is that the meaning of the web content is not machine-readable in the sense that computers cannot understand words and sentences as humans can. In addition, trust marks are invisible to search engines, thus limiting their usefulness in practice. During the last five years there have been different attempts to use Semantic Web tools to label health-related web resources to help internet users identify trustworthy resources. This paper discusses how Semantic Web technologies can be applied in practice to generate machine-readable labels and display their content, as well as to empower end-users by providing them with the infrastructure for expressing and sharing their opinions on the quality of health-related web resources.
International Nuclear Information System (INIS)
Lang, Dustin; Hogg, David W.
2012-01-01
We performed an image search for 'Comet Holmes', using the Yahoo! Web search engine, on 2010 April 1. Thousands of images were returned. We astrometrically calibrated—and therefore vetted—the images using the Astrometry.net system. The calibrated image pointings form a set of data points to which we can fit a test-particle orbit in the solar system, marginalizing over image dates and detecting outliers. The approach is Bayesian and the model is, in essence, a model of how comet astrophotographers point their instruments. In this work, we do not measure the position of the comet within each image, but rather use the celestial position of the whole image to infer the orbit. We find very strong probabilistic constraints on the orbit, although slightly off the Jet Propulsion Lab ephemeris, probably due to limitations of our model. Hyperparameters of the model constrain the reliability of date meta-data and where in the image astrophotographers place the comet; we find that ∼70% of the meta-data are correct and that the comet typically appears in the central third of the image footprint. This project demonstrates that discoveries and measurements can be made using data of extreme heterogeneity and unknown provenance. As the size and diversity of astronomical data sets continues to grow, approaches like ours will become more essential. This project also demonstrates that the Web is an enormous repository of astronomical information, and that if an object has been given a name and photographed thousands of times by observers who post their images on the Web, we can (re-)discover it and infer its dynamical properties.
Ture, Ferhan
2013-01-01
With the adoption of web services in daily life, people have access to tremendous amounts of information, beyond any human's reading and comprehension capabilities. As a result, search technologies have become a fundamental tool for accessing information. Furthermore, the web contains information in multiple languages, introducing another barrier…
Improving web site performance using commercially available analytical tools.
Ogle, James A
2010-10-01
It is easy to accurately measure web site usage and to quantify key parameters such as page views, site visits, and more complex variables using commercially available tools that analyze web site log files and search engine use. This information can be used strategically to guide the design or redesign of a web site (templates, look-and-feel, and navigation infrastructure) to improve overall usability. The data can also be used tactically to assess the popularity and use of new pages and modules that are added and to rectify problems that surface. This paper describes software tools used to: (1) inventory search terms that lead to available content; (2) propose synonyms for commonly used search terms; (3) evaluate the effectiveness of calls to action; (4) conduct path analyses to targeted content. The American Academy of Orthopaedic Surgeons (AAOS) uses SurfRay's Behavior Tracking software (Santa Clara CA, USA, and Copenhagen, Denmark) to capture and archive the search terms that have been entered into the site's Google Mini search engine. The AAOS also uses Unica's NetInsight program to analyze its web site log files. These tools provide the AAOS with information that quantifies how well its web sites are operating and insights for making improvements to them. Although it is easy to quantify many aspects of an association's web presence, it also takes human involvement to analyze the results and then recommend changes. Without a dedicated resource to do this, the work often is accomplished only sporadically and on an ad hoc basis.
Web Mining and Social Networking
DEFF Research Database (Denmark)
Xu, Guandong; Zhang, Yanchun; Li, Lin
This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...
Paro, Alberto
2015-01-01
If you are a developer who implements ElasticSearch in your web applications and want to sharpen your understanding of the core elements and applications, this is the book for you. It is assumed that you've got working knowledge of JSON and, if you want to extend ElasticSearch, of Java and related technologies.
Collaborative web hosting challenges and research directions
Ahmed, Reaz
2014-01-01
This brief presents a peer-to-peer (P2P) web-hosting infrastructure (named pWeb) that can transform networked, home-entertainment devices into lightweight collaborating Web servers for persistently storing and serving multimedia and web content. The issues addressed include ensuring content availability, Plexus routing and indexing, naming schemes, web ID, collaborative web search, network architecture and content indexing. In pWeb, user-generated voluminous multimedia content is proactively uploaded to a nearby network location (preferably within the same LAN or at least, within the same ISP)
Estimating search engine index size variability: a 9-year longitudinal study.
van den Bosch, Antal; Bogers, Toine; de Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing's indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.
Health literacy and usability of clinical trial search engines.
Utami, Dina; Bickmore, Timothy W; Barry, Barbara; Paasche-Orlow, Michael K
2014-01-01
Several web-based search engines have been developed to assist individuals to find clinical trials for which they may be interested in volunteering. However, these search engines may be difficult for individuals with low health and computer literacy to navigate. The authors present findings from a usability evaluation of clinical trial search tools with 41 participants across the health and computer literacy spectrum. The study consisted of 3 parts: (a) a usability study of an existing web-based clinical trial search tool; (b) a usability study of a keyword-based clinical trial search tool; and (c) an exploratory study investigating users' information needs when deciding among 2 or more candidate clinical trials. From the first 2 studies, the authors found that users with low health literacy have difficulty forming queries using keywords and have significantly more difficulty using a standard web-based clinical trial search tool compared with users with adequate health literacy. From the third study, the authors identified the search factors most important to individuals searching for clinical trials and how these varied by health literacy level.
Geospatial Semantics and the Semantic Web
Ashish, Naveen
2011-01-01
The availability of geographic and geospatial information and services, especially on the open Web has become abundant in the last several years with the proliferation of online maps, geo-coding services, geospatial Web services and geospatially enabled applications. The need for geospatial reasoning has significantly increased in many everyday applications including personal digital assistants, Web search applications, local aware mobile services, specialized systems for emergency response, medical triaging, intelligence analysis and more. Geospatial Semantics and the Semantic Web: Foundation
Web-based Logbook System for EAST Experiments
International Nuclear Information System (INIS)
Yang Fei; Xiao Bingjia
2010-01-01
Implementation of a web-based logbook system on EAST is introduced, which can store the comments for the experiments into a database and access the documents via various web browsers. The three-tier software architecture and asynchronous access technology are adopted to improve the system effectively. Authorized users can view the information of real-time discharge, comments from others and signal plots; add, delete, or revise their own comments; search signal data or comments under complicated search conditions; and collect relevant information and output it to an excel file. The web pages can be automatically updated after a new discharge is completed and without refreshment.
Personalized Metaheuristic Clustering Onto Web Documents
Institute of Scientific and Technical Information of China (English)
Wookey Lee
2004-01-01
Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is developed for automatic web document classification. The web document classification problem is addressed as the problem of best describing a match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on - line during the search by transforming model web sites. As a result, web sites can be clustered optimally in terms of keyword vectofs of corresponding web documents.
Multilingual Federated Searching Across Heterogeneous Collections.
Powell, James; Fox, Edward A.
1998-01-01
Describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. Details Searchable Database Markup Language (SearchDB-ML) for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages. (Author)
Study on online community user motif using web usage mining
Alphy, Meera; Sharma, Ajay
2016-04-01
The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.
Web OPAC Interfaces: An Overview.
Babu, B. Ramesh; O'Brien, Ann
2000-01-01
Discussion of Web-based online public access catalogs (OPACs) focuses on a review of six Web OPAC interfaces in use in academic libraries in the United Kingdom. Presents a checklist and guidelines of important features and functions that are currently available, including search strategies, access points, display, links, and layout. (Author/LRW)
WebVR: an interactive web browser for virtual environments
Barsoum, Emad; Kuester, Falko
2005-03-01
The pervasive nature of web-based content has lead to the development of applications and user interfaces that port between a broad range of operating systems and databases, while providing intuitive access to static and time-varying information. However, the integration of this vast resource into virtual environments has remained elusive. In this paper we present an implementation of a 3D Web Browser (WebVR) that enables the user to search the internet for arbitrary information and to seamlessly augment this information into virtual environments. WebVR provides access to the standard data input and query mechanisms offered by conventional web browsers, with the difference that it generates active texture-skins of the web contents that can be mapped onto arbitrary surfaces within the environment. Once mapped, the corresponding texture functions as a fully integrated web-browser that will respond to traditional events such as the selection of links or text input. As a result, any surface within the environment can be turned into a web-enabled resource that provides access to user-definable data. In order to leverage from the continuous advancement of browser technology and to support both static as well as streamed content, WebVR uses ActiveX controls to extract the desired texture skin from industry strength browsers, providing a unique mechanism for data fusion and extensibility.
A longitudinal analysis of search engine index size
Bosch, A.P.J. van den; Bogers, T.; Kunder, M. de; Salah, A. A.; Tonta, Y.; Salah, A. A. A.; Sugimoto, C.; Al, U.
2015-01-01
One of the determining factors of the quality of Web search engines is the size and quality of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We
Quality analysis of patient information about knee arthroscopy on the World Wide Web.
Sambandam, Senthil Nathan; Ramasamy, Vijayaraj; Priyanka, Priyanka; Ilango, Balakrishnan
2007-05-01
This study was designed to ascertain the quality of patient information available on the World Wide Web on the topic of knee arthroscopy. For the purpose of quality analysis, we used a pool of 232 search results obtained from 7 different search engines. We used a modified assessment questionnaire to assess the quality of these Web sites. This questionnaire was developed based on similar studies evaluating Web site quality and includes items on illustrations, accessibility, availability, accountability, and content of the Web site. We also compared results obtained with different search engines and tried to establish the best possible search strategy to attain the most relevant, authentic, and adequate information with minimum time consumption. For this purpose, we first compared 100 search results from the single most commonly used search engine (AltaVista) with the pooled sample containing 20 search results from each of the 7 different search engines. The search engines used were metasearch (Copernic and Mamma), general search (Google, AltaVista, and Yahoo), and health topic-related search engines (MedHunt and Healthfinder). The phrase "knee arthroscopy" was used as the search terminology. Excluding the repetitions, there were 117 Web sites available for quality analysis. These sites were analyzed for accessibility, relevance, authenticity, adequacy, and accountability by use of a specially designed questionnaire. Our analysis showed that most of the sites providing patient information on knee arthroscopy contained outdated information, were inadequate, and were not accountable. Only 16 sites were found to be providing reasonably good patient information and hence can be recommended to patients. Understandably, most of these sites were from nonprofit organizations and educational institutions. Furthermore, our study revealed that using multiple search engines increases patients' chances of obtaining more relevant information rather than using a single search
Marolt, Klemen
2013-01-01
Search engine optimization techniques, often shortened to “SEO,” should lead to first positions in organic search results. Some optimization techniques do not change over time, yet still form the basis for SEO. However, as the Internet and web design evolves dynamically, new optimization techniques flourish and flop. Thus, we looked at the most important factors that can help to improve positioning in search results. It is important to emphasize that none of the techniques can guarantee high ...
Numerical Algorithms for Personalized Search in Self-organizing Information Networks
Kamvar, Sep
2010-01-01
This book lays out the theoretical groundwork for personalized search and reputation management, both on the Web and in peer-to-peer and social networks. Representing much of the foundational research in this field, the book develops scalable algorithms that exploit the graphlike properties underlying personalized search and reputation management, and delves into realistic scenarios regarding Web-scale data. Sep Kamvar focuses on eigenvector-based techniques in Web search, introducing a personalized variant of Google's PageRank algorithm, and he outlines algorithms--such as the now-famous quad
Database with web interface and search engine as a diagnostics tool for electromagnetic calorimeter
Paluoja, Priit
2017-01-01
During 2016 data collection, the Compact Muon Solenoid Data Acquisition (CMS DAQ) system has shown a very good reliability. Nevertheless, the high complexity of the hardware and the software involved is, by its nature, prone to some occasional problems. As CMS subdetector, electromagnetic calorimeter (ECAL) is affected in the same way. Some of the issues are not predictable and can appear during the year more than once such as components getting noisy, power shortcuts or failing communication between machines. The chain detection-diagnosis-intervention must be as fast as possible to minimise the downtime of the detector. The aim of this project was to create a diagnostic software for ECAL crew, which consists of database and its web interface that allows to search, add and edit the contents of the database.
How much data resides in a web collection: how to estimate size of a web collection
Khelghati, Mohammadreza; Hiemstra, Djoerd; van Keulen, Maurice
2013-01-01
With increasing amount of data in deep web sources (hidden from general search engines behind web forms), accessing this data has gained more attention. In the algorithms applied for this purpose, it is the knowledge of a data source size that enables the algorithms to make accurate decisions in
Comparative analysis of some search engines
Directory of Open Access Journals (Sweden)
Taiwo O. Edosomwan
2010-10-01
Full Text Available We compared the information retrieval performances of some popular search engines (namely, Google, Yahoo, AlltheWeb, Gigablast, Zworks and AltaVista and Bing/MSN in response to a list of ten queries, varying in complexity. These queries were run on each search engine and the precision and response time of the retrieved results were recorded. The first ten documents on each retrieval output were evaluated as being ‘relevant’ or ‘non-relevant’ for evaluation of the search engine’s precision. To evaluate response time, normalised recall ratios were calculated at various cut-off points for each query and search engine. This study shows that Google appears to be the best search engine in terms of both average precision (70% and average response time (2 s. Gigablast and AlltheWeb performed the worst overall in this study.
WebDASC: a web-based dietary assessment software for 8-11-year-old Danish children
DEFF Research Database (Denmark)
Biltoft-Jensen, Anja Pia; Trolle, Ellen; Christensen, Tue
, literature review, and usability tests preceded its release. Special consideration was given to age-appropriate design issues. Results: In WebDASC an animated armadillo guides respondents through six daily eating occasions and helps them report foods and beverages previously consumed. A database of 1300...... food items is available either through category browse or free text search, aided by a spell check application. A type-in format is available for foods not otherwise found through category browse or text search. Amount consumed is estimated by selecting the closest portion size among four different...... digital images. WebDASC includes internal checks for frequently forgotten foods, and the following features to create motivation: a food-meter displaying cumulative weight of foods reported, a most popular food ranking, and a computer game with a high score list. Conclusions: WebDASC was developed...
Teague-Rector, Susan; Ballard, Angela; Pauley, Susan K.
2011-01-01
Creating a learnable, effective, and user-friendly library Web site hinges on providing easy access to search. Designing a search interface for academic libraries can be particularly challenging given the complexity and range of searchable library collections, such as bibliographic databases, electronic journals, and article search silos. Library…
Citation Searching: Search Smarter & Find More
Hammond, Chelsea C.; Brown, Stephanie Willen
2008-01-01
The staff at University of Connecticut are participating in Elsevier's Student Ambassador Program (SAmP) in which graduate students train their peers on "citation searching" research using Scopus and Web of Science, two tremendous citation databases. They are in the fourth semester of these training programs, and they are wildly successful: They…
Identify Web-page Content meaning using Knowledge based System for Dual Meaning Words
Sinha, Sukanta; Dattagupta, Rana; Mukhopadhyay, Debajyoti
2012-01-01
Meaning of Web-page content plays a big role while produced a search result from a search engine. Most of the cases Web-page meaning stored in title or meta-tag area but those meanings do not always match with Web-page content. To overcome this situation we need to go through the Web-page content to identify the Web-page meaning. In such cases, where Webpage content holds dual meaning words that time it is really difficult to identify the meaning of the Web-page. In this paper, we are introdu...
Music Search Engines: Specifications and Challenges
DEFF Research Database (Denmark)
Nanopoulos, Alexandros; Rafilidis, Dimitrios; Manolopoulos, Yannis
2009-01-01
Nowadays we have a proliferation of music data available over the Web. One of the imperative challenges is how to search these vast, global-scale musical resources to find preferred music. Recent research has envisaged the notion of music search engines (MSEs) that allow for searching preferred...
Schweiger, Stefan; Oeberst, Aileen; Cress, Ulrike
2014-03-26
The public typically believes psychotherapy to be more effective than pharmacotherapy for depression treatments. This is not consistent with current scientific evidence, which shows that both types of treatment are about equally effective. The study investigates whether this bias towards psychotherapy guides online information search and whether the bias can be reduced by explicitly providing expert information (in a blog entry) and by providing tag clouds that implicitly reveal experts' evaluations. A total of 174 participants completed a fully automated Web-based study after we invited them via mailing lists. First, participants read two blog posts by experts that either challenged or supported the bias towards psychotherapy. Subsequently, participants searched for information about depression treatment in an online environment that provided more experts' blog posts about the effectiveness of treatments based on alleged research findings. These blogs were organized in a tag cloud; both psychotherapy tags and pharmacotherapy tags were popular. We measured tag and blog post selection, efficacy ratings of the presented treatments, and participants' treatment recommendation after information search. Participants demonstrated a clear bias towards psychotherapy (mean 4.53, SD 1.99) compared to pharmacotherapy (mean 2.73, SD 2.41; t173=7.67, Pinformation search and evaluation. This bias was significantly reduced, however, when participants were exposed to tag clouds with challenging popular tags. Participants facing popular tags challenging their bias (n=61) showed significantly less biased tag selection (F2,168=10.61, Pinformation as presented in blog posts, compared to supporting expert information (n=81), decreased the bias in information search with regard to blog post selection (F1,168=4.32, P=.04, partial eta squared=0.025). No significant effects were found for treatment recommendation (Ps>.33). We conclude that the psychotherapy bias is most effectively
The RCSB Protein Data Bank: redesigned web site and web services.
Rose, Peter W; Beran, Bojan; Bi, Chunxiao; Bluhm, Wolfgang F; Dimitropoulos, Dimitris; Goodsell, David S; Prlic, Andreas; Quesada, Martha; Quinn, Gregory B; Westbrook, John D; Young, Jasmine; Yukich, Benjamin; Zardecki, Christine; Berman, Helen M; Bourne, Philip E
2011-01-01
The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education.
Finding Specification Pages from the Web
Yoshinaga, Naoki; Torisawa, Kentaro
This paper presents a method of finding a specification page on the Web for a given object (e.g., ``Ch. d'Yquem'') and its class label (e.g., ``wine''). A specification page for an object is a Web page which gives concise attribute-value information about the object (e.g., ``county''-``Sauternes'') in well formatted structures. A simple unsupervised method using layout and symbolic decoration cues was applied to a large number of the Web pages to acquire candidate attributes for each class (e.g., ``county'' for a class ``wine''). We then filter out irrelevant words from the putative attributes through an author-aware scoring function that we called site frequency. We used the acquired attributes to select a representative specification page for a given object from the Web pages retrieved by a normal search engine. Experimental results revealed that our system greatly outperformed the normal search engine in terms of this specification retrieval.
Discovery and Selection of Semantic Web Services
Wang, Xia
2013-01-01
For advanced web search engines to be able not only to search for semantically related information dispersed over different web pages, but also for semantic services providing certain functionalities, discovering semantic services is the key issue. Addressing four problems of current solution, this book presents the following contributions. A novel service model independent of semantic service description models is proposed, which clearly defines all elements necessary for service discovery and selection. It takes service selection as its gist and improves efficiency. Corresponding selection algorithms and their implementation as components of the extended Semantically Enabled Service-oriented Architecture in the Web Service Modeling Environment are detailed. Many applications of semantic web services, e.g. discovery, composition and mediation, can benefit from a general approach for building application ontologies. With application ontologies thus built, services are discovered in the same way as with single...
Information about liver transplantation on the World Wide Web.
Hanif, F; Sivaprakasam, R; Butler, A; Huguet, E; Pettigrew, G J; Michael, E D A; Praseedom, R K; Jamieson, N V; Bradley, J A; Gibbs, P
2006-09-01
Orthotopic liver transplant (OLTx) has evolved to a successful surgical management for end-stage liver diseases. Awareness and information about OLTx is an important tool in assisting OLTx recipients and people supporting them, including non-transplant clinicians. The study aimed to investigate the nature and quality of liver transplant-related patient information on the World Wide Web. Four common search engines were used to explore the Internet by using the key words 'Liver transplant'. The URL (unique resource locator) of the top 50 returns was chosen as it was judged unlikely that the average user would search beyond the first 50 sites returned by a given search. Each Web site was assessed on the following categories: origin, language, accessibility and extent of the information. A weighted Information Score (IS) was created to assess the quality of clinical and educational value of each Web site and was scored independently by three transplant clinicians. The Internet search performed with the aid of the four search engines yielded a total of 2,255,244 Web sites. Of the 200 possible sites, only 58 Web sites were assessed because of repetition of the same Web sites and non-accessible links. The overall median weighted IS was 22 (IQR 1 - 42). Of the 58 Web sites analysed, 45 (77%) belonged to USA, six (10%) were European, and seven (12%) were from the rest of the world. The median weighted IS of publications originating from Europe and USA was 40 (IQR = 22 - 60) and 23 (IQR = 6 - 38), respectively. Although European Web sites produced a higher weighted IS [40 (IQR = 22 - 60)] as compared with the USA publications [23 (IQR = 6 - 38)], this was not statistically significant (p = 0.07). Web sites belonging to the academic institutions and the professional organizations scored significantly higher with a median weighted IS of 28 (IQR = 16 - 44) and 24(12 - 35), respectively, as compared with the commercial Web sites (median = 6 with IQR of 0 - 14, p = .001). There
Resolving person names in web people search
Balog, K.; Azzopardi, L.; de Rijke, M.; King, I.; Baeza-Yates, R.
2009-01-01
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.
Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke
2014-01-01
The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
Exploring the academic invisible web
Lewandowski, Dirk
2006-01-01
The Invisible Web is often discussed in the academic context, where its contents (mainly in the form of databases) are of great importance. But this discussion is mainly based on some seminal research done by Sherman and Price (2001) and Bergman (2001), respectively. We focus on the types of Invisible Web content relevant for academics and the improvements made by search engines to deal with these content types. In addition, we question the volume of the Invisible Web as stated by Bergman. Ou...
Web information retrieval based on ontology
Zhang, Jian
2013-03-01
The purpose of the Information Retrieval (IR) is to find a set of documents that are relevant for a specific information need of a user. Traditional Information Retrieval model commonly used in commercial search engine is based on keyword indexing system and Boolean logic queries. One big drawback of traditional information retrieval is that they typically retrieve information without an explicitly defined domain of interest to the users so that a lot of no relevance information returns to users, which burden the user to pick up useful answer from these no relevance results. In order to tackle this issue, many semantic web information retrieval models have been proposed recently. The main advantage of Semantic Web is to enhance search mechanisms with the use of Ontology's mechanisms. In this paper, we present our approach to personalize web search engine based on ontology. In addition, key techniques are also discussed in our paper. Compared to previous research, our works concentrate on the semantic similarity and the whole process including query submission and information annotation.
Rushton, Erin E.; Kelehan, Martha Daisy; Strong, Marcy A.
2008-01-01
Search engine use is one of the most popular online activities. According to a recent OCLC report, nearly all students start their electronic research using a search engine instead of the library Web site. Instead of viewing search engines as competition, however, librarians at Binghamton University Libraries decided to employ search engine…
Directory of Open Access Journals (Sweden)
Enrique Luna Ramírez
2008-12-01
Full Text Available In this paper, the design of a Web metadata search model with semi-intelligent features is proposed. The search model is oriented to retrieve the metadata associated to a data warehouse in a fast, flexible and reliable way. Our proposal includes a set of distinctive functionalities, which consist of the temporary storage of the frequently used metadata in an exclusive store, different to the global data warehouse metadata store, and of the use of control processes to retrieve information from both stores through aliases of concepts.En este artículo se propone el diseño de un modelo para la búsqueda Web de metadatos con características semiinteligentes. El modelo ha sido concebido para recuperar de manera rápida, flexible y fiable los metadatos asociados a un data warehouse corporativo. Nuestra propuesta incluye un conjunto de funcionalidades distintivas consistentes en el almacenamiento temporal de los metadatos de uso frecuente en un almacén exclusivo, diferente al almacén global de metadatos, y al uso de procesos de control para recuperar información de ambos almacenes a través de alias de conceptos.
FASH: A web application for nucleotides sequence search
Directory of Open Access Journals (Sweden)
Chew Paul
2008-05-01
Full Text Available Abstract FASH (Fourier Alignment Sequence Heuristics is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome, FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. Availability FASH can be accessed at https://fash.bgu.ac.il:8443/fash/default.jsp (secured website
GLIDERS - A web-based search engine for genome-wide linkage disequilibrium between HapMap SNPs
Directory of Open Access Journals (Sweden)
Broxholme John
2009-10-01
Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.
The Program Management Challenges of Web 2.0
2010-06-01
scraping Web services TBD publishing participation TBD content management systems wikis TBD directories (taxonomy) tagging (“ folksonomy ”) TBD... Folksonomies (User chosen keywords to organize and index onlive content to facilitate its use, especially helpful in searching) • Video Sharing...and folksonomies . A Web 2.0 site allows its users to interact with other users or to change Web site content, in contrast to non-interactive Web
Blending vertical and web results: A case study using video intent
Lefortier, D.; Serdyukov, P.; Romanenko, F.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; Hofmann, K.
2014-01-01
Modern search engines aggregate results from specialized verticals into the Web search results. We study a setting where vertical and Web results are blended into a single result list, a setting that has not been studied before. We focus on video intent and present a detailed observational study of
Rogozinski, Marek
2014-01-01
This book is a detailed, practical, hands-on guide packed with real-life scenarios and examples which will show you how to implement an ElasticSearch search engine on your own websites.If you are a web developer or a user who wants to learn more about ElasticSearch, then this is the book for you. You do not need to know anything about ElastiSeach, Java, or Apache Lucene in order to use this book, though basic knowledge about databases and queries is required.
Web sites that work secrets from winning web sites
Smith, Jon
2012-01-01
Leading web site entrepreneur Jon Smith has condensed the secrets of his success into 52 inspiring ideas that even the most hopeless technophobe can implement. The brilliant tips and practical advice in Web sites that work will uplift and transform any website, from the simplest to the most complicated. It deals with everything from fundamentals such as how to assess the effectiveness of a website and how to get a site listed on the most popular search engines to more sophisticated challenges like creating a community and dealing with legal requirements. Straight-talking, practical and humorou
Teen smoking cessation help via the Internet: a survey of search engines.
Edwards, Christine C; Elliott, Sean P; Conway, Terry L; Woodruff, Susan I
2003-07-01
The objective of this study was to assess Web sites related to teen smoking cessation on the Internet. Seven Internet search engines were searched using the keywords teen quit smoking. The top 20 hits from each search engine were reviewed and categorized. The keywords teen quit smoking produced between 35 and 400,000 hits depending on the search engine. Of 140 potential hits, 62% were active, unique sites; 85% were listed by only one search engine; and 40% focused on cessation. Findings suggest that legitimate on-line smoking cessation help for teens is constrained by search engine choice and the amount of time teens spend looking through potential sites. Resource listings should be updated regularly. Smoking cessation Web sites need to be picked up on multiple search engine searches. Further evaluation of smoking cessation Web sites need to be conducted to identify the most effective help for teens.
Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália
2016-07-01
Text mining and semantic analysis approaches can be applied to the construction of biomedical domain-specific search engines and provide an attractive alternative to create personalized and enhanced search experiences. Therefore, this work introduces the new open-source BIOMedical Search Engine Framework for the fast and lightweight development of domain-specific search engines. The rationale behind this framework is to incorporate core features typically available in search engine frameworks with flexible and extensible technologies to retrieve biomedical documents, annotate meaningful domain concepts, and develop highly customized Web search interfaces. The BIOMedical Search Engine Framework integrates taggers for major biomedical concepts, such as diseases, drugs, genes, proteins, compounds and organisms, and enables the use of domain-specific controlled vocabulary. Technologies from the Typesafe Reactive Platform, the AngularJS JavaScript framework and the Bootstrap HTML/CSS framework support the customization of the domain-oriented search application. Moreover, the RESTful API of the BIOMedical Search Engine Framework allows the integration of the search engine into existing systems or a complete web interface personalization. The construction of the Smart Drug Search is described as proof-of-concept of the BIOMedical Search Engine Framework. This public search engine catalogs scientific literature about antimicrobial resistance, microbial virulence and topics alike. The keyword-based queries of the users are transformed into concepts and search results are presented and ranked accordingly. The semantic graph view portraits all the concepts found in the results, and the researcher may look into the relevance of different concepts, the strength of direct relations, and non-trivial, indirect relations. The number of occurrences of the concept shows its importance to the query, and the frequency of concept co-occurrence is indicative of biological relations
Levay, Paul; Ainsworth, Nicola; Kettle, Rachel; Morgan, Antony
2016-03-01
To examine how effectively forwards citation searching with Web of Science (WOS) or Google Scholar (GS) identified evidence to support public health guidance published by the National Institute for Health and Care Excellence. Forwards citation searching was performed using GS on a base set of 46 publications and replicated using WOS. WOS and GS were compared in terms of recall; precision; number needed to read (NNR); administrative time and costs; and screening time and costs. Outcomes for all publications were compared with those for a subset of highly important publications. The searches identified 43 relevant publications. The WOS process had 86.05% recall and 1.58% precision. The GS process had 90.7% recall and 1.62% precision. The NNR to identify one relevant publication was 63.3 with WOS and 61.72 with GS. There were nine highly important publications. WOS had 100% recall, 0.38% precision and NNR of 260.22. GS had 88.89% recall, 0.33% precision and NNR of 300.88. Administering the WOS results took 4 h and cost £88-£136, compared with 75 h and £1650-£2550 with GS. WOS is recommended over GS, as citation searching was more effective, while the administrative and screening times and costs were lower. Copyright © 2015 John Wiley & Sons, Ltd.
Social Networking on the Semantic Web
Finin, Tim; Ding, Li; Zhou, Lina; Joshi, Anupam
2005-01-01
Purpose: Aims to investigate the way that the semantic web is being used to represent and process social network information. Design/methodology/approach: The Swoogle semantic web search engine was used to construct several large data sets of Resource Description Framework (RDF) documents with social network information that were encoded using the…
Wighting, Mervyn J.; Lucking, Robert A.; Christmann, Edwin P.
2004-01-01
Teachers search for ways to enhance oceanography units in the classroom. There are many online resources available to help one explore the mysteries of the deep. This article describes a collection of Web sites on this topic appropriate for middle level classrooms.
Detection And Classification Of Web Robots With Honeypots
2016-03-01
Web robots are valuable tools for indexing content on the Web, they can also be malicious through phishing , spamming, or performing targeted attacks...indexing content on the Web, they can also be malicious through phishing , spamming, or performing targeted attacks. In this thesis, we study an approach...programs has been attributed to the explosion in content and user-generated social media on the Internet. The Web search engines like Google require
Directory of Open Access Journals (Sweden)
Wasik Szymon
2010-05-01
Full Text Available Abstract Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA
Marketing plan for a web shop business
Koskivaara, Leonilla
2014-01-01
Internet has changed the buying behavior of consumers during the past years and companies need to adapt to the changes. Web shop business is an important sales channel of today’s companies. Advantages of a web shop business include cost effectiveness and potential to do business globally. Challenges of a web shop business include search engine optimization and running both, a retail store and a web shop at the same time. Social media has become an important marketing channel and has bec...
Development of web database system for JAERI ERL-FEL
International Nuclear Information System (INIS)
Kikuzawa, Nobuhiro
2005-01-01
The accelerator control system for the JAERI ERL-FEL is a PC-based distributed control system. The accelerator status record is stored automatically through the control system to analyze the influence on the electron beam. In order to handle effectively a large number of stored data, it is necessary that the required data can be searched and visualized in easy operation. For this reason, a web database (DB) system which can search of the required data and display visually on a web browser was developed by using open source software. With introduction of this system, accelerator operators can monitor real-time information anytime, anywhere through a web browser. Development of the web DB system is described in this paper. (author)
Development of web database system for JAERI ERL-FEL
Energy Technology Data Exchange (ETDEWEB)
Kikuzawa, Nobuhiro [Japan Atomic Energy Research Inst., Kansai Research Establishment, Advanced Photon Research Center, Tokai, Ibaraki (Japan)
2005-06-01
The accelerator control system for the JAERI ERL-FEL is a PC-based distributed control system. The accelerator status record is stored automatically through the control system to analyze the influence on the electron beam. In order to handle effectively a large number of stored data, it is necessary that the required data can be searched and visualized in easy operation. For this reason, a web database (DB) system which can search of the required data and display visually on a web browser was developed by using open source software. With introduction of this system, accelerator operators can monitor real-time information anytime, anywhere through a web browser. Development of the web DB system is described in this paper. (author)
Evidence-based Medicine Search: a customizable federated search engine.
Bracke, Paul J; Howse, David K; Keim, Samuel M
2008-04-01
This paper reports on the development of a tool by the Arizona Health Sciences Library (AHSL) for searching clinical evidence that can be customized for different user groups. The AHSL provides services to the University of Arizona's (UA's) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM) Search, that provides users with a simple search interface to EBM resources and presents results organized according to an evidence pyramid. EBM Search was developed with a web-based configuration component that allows the tool to be customized for different specialties. Informal and anecdotal feedback from physicians indicates that EBM Search is a useful tool with potential in teaching evidence-based decision making. While formal evaluation is still being planned, a tool such as EBM Search, which can be configured for specific user populations, may help lower barriers to information resources in an academic health sciences center.
AstroWeb -- Internet Resources for Astronomers
Jackson, R. E.; Adorf, H.-M.; Egret, D.; Heck, A.; Koekemoer, A.; Murtagh, F.; Wells, D. C.
AstroWeb is a World Wide Web (WWW) interface to a collection of Internet accessible resources aimed at the astronomical community. The collection currently contains more than 1000 WWW, Gopher, Wide Area Information System (WAIS), Telnet, and Anonymous FTP resources, and it is still growing. AstroWeb provides the additional value-added services: categorization of each resource; descriptive paragraphs for some resources; searchable index of all resource information; 3 times daily search for ``dead'' or ``unreliable'' resources.
Web-based resources for critical care education.
Kleinpell, Ruth; Ely, E Wesley; Williams, Ged; Liolios, Antonios; Ward, Nicholas; Tisherman, Samuel A
2011-03-01
To identify, catalog, and critically evaluate Web-based resources for critical care education. A multilevel search strategy was utilized. Literature searches were conducted (from 1996 to September 30, 2010) using OVID-MEDLINE, PubMed, and the Cumulative Index to Nursing and Allied Health Literature with the terms "Web-based learning," "computer-assisted instruction," "e-learning," "critical care," "tutorials," "continuing education," "virtual learning," and "Web-based education." The Web sites of relevant critical care organizations (American College of Chest Physicians, American Society of Anesthesiologists, American Thoracic Society, European Society of Intensive Care Medicine, Society of Critical Care Medicine, World Federation of Societies of Intensive and Critical Care Medicine, American Association of Critical Care Nurses, and World Federation of Critical Care Nurses) were reviewed for the availability of e-learning resources. Finally, Internet searches and e-mail queries to critical care medicine fellowship program directors and members of national and international acute/critical care listserves were conducted to 1) identify the use of and 2) review and critique Web-based resources for critical care education. To ensure credibility of Web site information, Web sites were reviewed by three independent reviewers on the basis of the criteria of authority, objectivity, authenticity, accuracy, timeliness, relevance, and efficiency in conjunction with suggested formats for evaluating Web sites in the medical literature. Literature searches using OVID-MEDLINE, PubMed, and the Cumulative Index to Nursing and Allied Health Literature resulted in >250 citations. Those pertinent to critical care provide examples of the integration of e-learning techniques, the development of specific resources, reports of the use of types of e-learning, including interactive tutorials, case studies, and simulation, and reports of student or learner satisfaction, among other general
A Web Browser Interface to Manage the Searching and Organizing of Information on the Web by Learners
Li, Liang-Yi; Chen, Gwo-Dong
2010-01-01
Information Gathering is a knowledge construction process. Web learners make a plan for their Information Gathering task based on their prior knowledge. The plan is evolved with new information encountered and their mental model is constructed through continuously assimilating and accommodating new information gathered from different Web pages. In…
Ashish, Naveen
2005-01-01
We provide an overview of several ongoing NASA endeavors based on concepts, systems, and technology from the Semantic Web arena. Indeed NASA has been one of the early adopters of Semantic Web Technology and we describe ongoing and completed R&D efforts for several applications ranging from collaborative systems to airspace information management to enterprise search to scientific information gathering and discovery systems at NASA.
Comparative Study on Three Major Internet Search Engines ...
African Journals Online (AJOL)
, Google and ask.com search engines. Experimental method was used with ten reference questions which were used to query each of the search engines . Yahoo obtained the highest results (521,801,043) among the three Web search ...
An active registry for bioinformatics web services.
Pettifer, S.; Thorne, D.; McDermott, P.; Attwood, T.; Baran, J.; Bryne, J.C.; Hupponen, T.; Mowbray, D.; Vriend, G.
2009-01-01
SUMMARY: The EMBRACE Registry is a web portal that collects and monitors web services according to test scripts provided by the their administrators. Users are able to search for, rank and annotate services, enabling them to select the most appropriate working service for inclusion in their
Automatic Planning of External Search Engine Optimization
Directory of Open Access Journals (Sweden)
Vita Jasevičiūtė
2015-07-01
Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.
Lu, Ying-Hao; Kuo, Chen-Chun; Huang, Yaw-Bin
2011-08-01
We selected HTML, PHP and JavaScript as the programming languages to build "WebBio", a web-based system for patient data of biological products and used MySQL as database. WebBio is based on the PHP-MySQL suite and is run by Apache server on Linux machine. WebBio provides the functions of data management, searching function and data analysis for 20 kinds of biological products (plasma expanders, human immunoglobulin and hematological products). There are two particular features in WebBio: (1) pharmacists can rapidly find out whose patients used contaminated products for medication safety, and (2) the statistics charts for a specific product can be automatically generated to reduce pharmacist's work loading. WebBio has successfully turned traditional paper work into web-based data management.
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services.
Gessler, Damian D G; Schiltz, Gary S; May, Greg D; Avraham, Shulamit; Town, Christopher D; Grant, David; Nelson, Rex T
2009-09-23
SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies. There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at http://sswap.info (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at http://sswap.info/protocol.jsp, developer tools at http://sswap.info/developer.jsp, and a portal to third-party ontologies at http://sswapmeet.sswap.info (a "swap meet"). SSWAP addresses the three basic requirements of a semantic web services architecture (i.e., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: i.e., i) the fatal mutability of traditional interfaces, ii) the rigidity and fragility of static subsumption hierarchies, and iii) the
New Architectures for Presenting Search Results Based on Web Search Engines Users Experience
Martinez, F. J.; Pastor, J. A.; Rodriguez, J. V.; Lopez, Rosana; Rodriguez, J. V., Jr.
2011-01-01
Introduction: The Internet is a dynamic environment which is continuously being updated. Search engines have been, currently are and in all probability will continue to be the most popular systems in this information cosmos. Method: In this work, special attention has been paid to the series of changes made to search engines up to this point,…
Search Analytics for Your Site
Rosenfeld, Louis
2011-01-01
Any organization that has a searchable web site or intranet is sitting on top of hugely valuable and usually under-exploited data: logs that capture what users are searching for, how often each query was searched, and how many results each query retrieved. Search queries are gold: they are real data that show us exactly what users are searching for in their own words. This book shows you how to use search analytics to carry on a conversation with your customers: listen to and understand their needs, and improve your content, navigation and search performance to meet those needs.
Undergraduate Students’Evaluation Criteria When Using Web Resources for Class Papers
Directory of Open Access Journals (Sweden)
Tsai-Youn Hung
2004-09-01
Full Text Available The growth in popularity of the World Wide Web has dramatically changed the way undergraduate students conduct information searches. The purpose of this study is to investigate what core quality criteria undergraduate students use to evaluate Web resources for their class papers and to what extent they evaluate the Web resources. This study reports on five Web page evaluations and a questionnaire survey of thirty five undergraduate students in the Information Technology and Informatics Program at Rutgers University. Results show that undergraduate students have become increasingly sophisticated about using Web resources, but not yet sophisticated about searching them. Undergraduate students only used one or two surface quality criteria to evaluate Web resources. They made immediate judgments about the surface features of Web pages and ignored the content of the documents themselves. This research suggests that undergraduate instructors should take the responsibility for instructing students on basic Web use knowledge or work with librarians to develop undergraduate students information literacy skills.
[Biomedical information on the internet using search engines. A one-year trial].
Corrao, Salvatore; Leone, Francesco; Arnone, Sabrina
2004-01-01
The internet is a communication medium and content distributor that provide information in the general sense but it could be of great utility regarding as the search and retrieval of biomedical information. Search engines represent a great deal to rapidly find information on the net. However, we do not know whether general search engines and meta-search ones are reliable in order to find useful and validated biomedical information. The aim of our study was to verify the reproducibility of a search by key-words (pediatric or evidence) using 9 international search engines and 1 meta-search engine at the baseline and after a one year period. We analysed the first 20 citations as output of each searching. We evaluated the formal quality of Web-sites and their domain extensions. Moreover, we compared the output of each search at the start of this study and after a one year period and we considered as a criterion of reliability the number of Web-sites cited again. We found some interesting results that are reported throughout the text. Our findings point out an extreme dynamicity of the information on the Web and, for this reason, we advice a great caution when someone want to use search and meta-search engines as a tool for searching and retrieve reliable biomedical information. On the other hand, some search and meta-search engines could be very useful as a first step searching for defining better a search and, moreover, for finding institutional Web-sites too. This paper allows to know a more conscious approach to the internet biomedical information universe.
Discovering Land Cover Web Map Services from the Deep Web with JavaScript Invocation Rules
Directory of Open Access Journals (Sweden)
Dongyang Hou
2016-06-01
Full Text Available Automatic discovery of isolated land cover web map services (LCWMSs can potentially help in sharing land cover data. Currently, various search engine-based and crawler-based approaches have been developed for finding services dispersed throughout the surface web. In fact, with the prevalence of geospatial web applications, a considerable number of LCWMSs are hidden in JavaScript code, which belongs to the deep web. However, discovering LCWMSs from JavaScript code remains an open challenge. This paper aims to solve this challenge by proposing a focused deep web crawler for finding more LCWMSs from deep web JavaScript code and the surface web. First, the names of a group of JavaScript links are abstracted as initial judgements. Through name matching, these judgements are utilized to judge whether or not the fetched webpages contain predefined JavaScript links that may prompt JavaScript code to invoke WMSs. Secondly, some JavaScript invocation functions and URL formats for WMS are summarized as JavaScript invocation rules from prior knowledge of how WMSs are employed and coded in JavaScript. These invocation rules are used to identify the JavaScript code for extracting candidate WMSs through rule matching. The above two operations are incorporated into a traditional focused crawling strategy situated between the tasks of fetching webpages and parsing webpages. Thirdly, LCWMSs are selected by matching services with a set of land cover keywords. Moreover, a search engine for LCWMSs is implemented that uses the focused deep web crawler to retrieve and integrate the LCWMSs it discovers. In the first experiment, eight online geospatial web applications serve as seed URLs (Uniform Resource Locators and crawling scopes; the proposed crawler addresses only the JavaScript code in these eight applications. All 32 available WMSs hidden in JavaScript code were found using the proposed crawler, while not one WMS was discovered through the focused crawler
Evaluating search effectiveness of some selected search engines ...
African Journals Online (AJOL)
With advancement in technology, many individuals are getting familiar with the internet a lot of users seek for information on the World Wide Web (WWW) using variety of search engines. This research work evaluates the retrieval effectiveness of Google, Yahoo, Bing, AOL and Baidu. Precision, relative recall and response ...
Constructing a web recommender system using web usage mining and user’s profiles
Directory of Open Access Journals (Sweden)
T. Mombeini
2014-12-01
Full Text Available The World Wide Web is a great source of information, which is nowadays being widely used due to the availability of useful information changing, dynamically. However, the large number of webpages often confuses many users and it is hard for them to find information on their interests. Therefore, it is necessary to provide a system capable of guiding users towards their desired choices and services. Recommender systems search among a large collection of user interests and recommend those, which are likely to be favored the most by the user. Web usage mining was designed to function on web server records, which are included in user search results. Therefore, recommender servers use the web usage mining technique to predict users’ browsing patterns and recommend those patterns in the form of a suggestion list. In this article, a recommender system based on web usage mining phases (online and offline was proposed. In the offline phase, the first step is to analyze user access records to identify user sessions. Next, user profiles are built using data from server records based on the frequency of access to pages, the time spent by the user on each page and the date of page view. Date is of importance since it is more possible for users to request new pages more than old ones and old pages are less probable to be viewed, as users mostly look for new information. Following the creation of user profiles, users are categorized in clusters using the Fuzzy C-means clustering algorithm and S(c criterion based on their similarities. In the online phase, a neural network is offered to identify the suggested model while online suggestions are generated using the suggestion module for the active user. Search engines analyze suggestion lists based on rate of user interest in pages and page rank and finally suggest appropriate pages to the active user. Experiments show that the proposed method of predicting user recent requested pages has more accuracy and
Checklist of accessibility in Web informational environments
Directory of Open Access Journals (Sweden)
Christiane Gomes dos Santos
2017-01-01
Full Text Available This research deals with the process of search, navigation and retrieval of information by the person with blindness in web environment, focusing on knowledge of the areas of information recovery and architecture, to understanding the strategies used by these people to access the information on the web. It aims to propose the construction of an accessibility verification instrument, checklist, to be used to analyze the behavior of people with blindness in search actions, navigation and recovery sites and pages. It a research exploratory and descriptive of qualitative nature, with the research methodology, case study - the research to establish a specific study with the simulation of search, navigation and information retrieval using speech synthesis system, NonVisual Desktop Access, in assistive technologies laboratory, to substantiate the construction of the checklist for accessibility verification. It is considered the reliability of performed research and its importance for the evaluation of accessibility in web environment to improve the access of information for people with limited reading in order to be used on websites and pages accessibility check analysis.
Use of WebQuest Design for Inservice Teacher Professional Development
Iskeceli-Tunc, Sinem; Oner, Diler
2016-01-01
This study investigated whether a teacher professional development module built around designing WebQuests could improve teachers' technological and pedagogical skills. The technological skills examined included Web searching and Web evaluating skills. The pedagogical skills targeted were developing a working definition for higher-order thinking…
Flexible Web services integration: a novel personalised social approach
Metrouh, Abdelmalek; Mokhati, Farid
2018-05-01
Dynamic composition or integration remains one of the key objectives of Web services technology. This paper aims to propose an innovative approach of dynamic Web services composition based on functional and non-functional attributes and individual preferences. In this approach, social networks of Web services are used to maintain interactions between Web services in order to select and compose Web services that are more tightly related to user's preferences. We use the concept of Web services community in a social network of Web services to reduce considerably their search space. These communities are created by the direct involvement of Web services providers.
Improving PHENIX search with Solr, Nutch and Drupal
International Nuclear Information System (INIS)
Morrison, Dave; Sourikova, Irina
2012-01-01
During its 20 years of R and D, construction and operation the PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) has accumulated large amounts of proprietary collaboration data that is hosted on many servers around the world and is not open for commercial search engines for indexing and searching. The legacy search infrastructure did not scale well with the fast growing PHENIX document base and produced results inadequate in both precision and recall. After considering the possible alternatives that would provide an aggregated, fast, full text search of a variety of data sources and file formats we decided to use Nutch [1] as a web crawler and Solr [2] as a search engine. To present XML-based Solr search results in a user-friendly format we use Drupal [3] as a web interface to Solr. We describe the experience of building a federated search for a heterogeneous collection of 10 million PHENIX documents with Nutch, Solr and Drupal.
Improving PHENIX search with Solr, Nutch and Drupal.
Morrison, Dave; Sourikova, Irina
2012-12-01
During its 20 years of R&D, construction and operation the PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) has accumulated large amounts of proprietary collaboration data that is hosted on many servers around the world and is not open for commercial search engines for indexing and searching. The legacy search infrastructure did not scale well with the fast growing PHENIX document base and produced results inadequate in both precision and recall. After considering the possible alternatives that would provide an aggregated, fast, full text search of a variety of data sources and file formats we decided to use Nutch [1] as a web crawler and Solr [2] as a search engine. To present XML-based Solr search results in a user-friendly format we use Drupal [3] as a web interface to Solr. We describe the experience of building a federated search for a heterogeneous collection of 10 million PHENIX documents with Nutch, Solr and Drupal.
An open-source, mobile-friendly search engine for public medical knowledge.
Samwald, Matthias; Hanbury, Allan
2014-01-01
The World Wide Web has become an important source of information for medical practitioners. To complement the capabilities of currently available web search engines we developed FindMeEvidence, an open-source, mobile-friendly medical search engine. In a preliminary evaluation, the quality of results from FindMeEvidence proved to be competitive with those from TRIP Database, an established, closed-source search engine for evidence-based medicine.
Evaluation of Federated Searching Options for the School Library
Abercrombie, Sarah E.
2008-01-01
Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…
The internet and intelligent machines: search engines, agents and robots
International Nuclear Information System (INIS)
Achenbach, S.; Alfke, H.
2000-01-01
The internet plays an important role in a growing number of medical applications. Finding relevant information is not always easy as the amount of available information on the Web is rising quickly. Even the best Search Engines can only collect links to a fraction of all existing Web pages. In addition, many of these indexed documents have been changed or deleted. The vast majority of information on the Web is not searchable with conventional methods. New search strategies, technologies and standards are combined in Intelligent Search Agents (ISA) an Robots, which can retrieve desired information in a specific approach. Conclusion: The article describes differences between ISAs and conventional Search Engines and how communication between Agents improves their ability to find information. Examples of existing ISAs are given and the possible influences on the current and future work in radiology is discussed. (orig.) [de
Guide to cleaner coal technology-related web sites
Energy Technology Data Exchange (ETDEWEB)
Davidson, R; Jenkins, N; Zhang, X [IEA Coal Research - The Clean Coal Centre, London (United Kingdom)
2001-07-01
The 'Guide to Cleaner Coal Technology-Related Web Sites' is a guide to web sites that contain important information on cleaner coal technologies (CCT). It contains a short introduction to the World Wide Web and gives advice on how to search for information using directories and search engines. The core section of the Guide is a collection of factsheets summarising the information available on over 65 major web sites selected from organizations worldwide (except those promoting companies). These sites contain a wealth of information on CCT research and development, technology transfer, financing and markets. The factsheets are organised in the following categories. Associations, research centres and programmes; Climate change and sustainable development; Cooperative ventures; Electronic journals; Financial institutions; International organizations; National government information; and Statistical information. A full subject index is provided. The Guide concludes with some general comments on the quality of the sites reviewed.
Harnessing the Deep Web: Present and Future
Madhavan, Jayant; Afanasiev, Loredana; Antova, Lyublena; Halevy, Alon
2009-01-01
Over the past few years, we have built a system that has exposed large volumes of Deep-Web content to Google.com users. The content that our system exposes contributes to more than 1000 search queries per-second and spans over 50 languages and hundreds of domains. The Deep Web has long been acknowledged to be a major source of structured data on the web, and hence accessing Deep-Web content has long been a problem of interest in the data management community. In this paper, we report on where...
A reverse engineering approach for automatic annotation of Web pages
R. de Virgilio (Roberto); F. Frasincar (Flavius); W. Hop (Walter); S. Lachner (Stephan)
2013-01-01
textabstractThe Semantic Web is gaining increasing interest to fulfill the need of sharing, retrieving, and reusing information. Since Web pages are designed to be read by people, not machines, searching and reusing information on the Web is a difficult task without human participation. To this aim
Schafer, Roland
2013-01-01
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and rem...
Deep web query interface understanding and integration
Dragut, Eduard C; Yu, Clement T
2012-01-01
There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art tech
Analysis Tool Web Services from the EMBL-EBI.
McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Cowley, Andrew Peter; Lopez, Rodrigo
2013-07-01
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
Materializing the web of linked data
Konstantinou, Nikolaos
2015-01-01
This book explains the Linked Data domain by adopting a bottom-up approach: it introduces the fundamental Semantic Web technologies and building blocks, which are then combined into methodologies and end-to-end examples for publishing datasets as Linked Data, and use cases that harness scholarly information and sensor data. It presents how Linked Data is used for web-scale data integration, information management and search. Special emphasis is given to the publication of Linked Data from relational databases as well as from real-time sensor data streams. The authors also trace the transformation from the document-based World Wide Web into a Web of Data. Materializing the Web of Linked Data is addressed to researchers and professionals studying software technologies, tools and approaches that drive the Linked Data ecosystem, and the Web in general.
What and how children search on the web
Duarte Torres, Sergio; Weber, Ingmar
2011-01-01
The Internet has become an important part of the daily life of children as a source of information and leisure activities. Nonetheless, given that most of the content available on the web is aimed at the general public, children are constantly exposed to inappropriate content, either because the
Focused Crawling of the Deep Web Using Service Class Descriptions
Energy Technology Data Exchange (ETDEWEB)
Rocco, D; Liu, L; Critchlow, T
2004-06-21
Dynamic Web data sources--sometimes known collectively as the Deep Web--increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DynaBot has three unique characteristics. First, DynaBot utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular, self-tuning system architecture for focused crawling of the DeepWeb using service class descriptions. Third, DynaBot incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.
Noesis: Ontology based Scoped Search Engine and Resource Aggregator for Atmospheric Science
Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.
2006-12-01
The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Web party effect: a cocktail party effect in the web environment.
Rigutti, Sara; Fantoni, Carlo; Gerbino, Walter
2015-01-01
In goal-directed web navigation, labels compete for selection: this process often involves knowledge integration and requires selective attention to manage the dizziness of web layouts. Here we ask whether the competition for selection depends on all web navigation options or only on those options that are more likely to be useful for information seeking, and provide evidence in favor of the latter alternative. Participants in our experiment navigated a representative set of real websites of variable complexity, in order to reach an information goal located two clicks away from the starting home page. The time needed to reach the goal was accounted for by a novel measure of home page complexity based on a part of (not all) web options: the number of links embedded within web navigation elements weighted by the number and type of embedding elements. Our measure fully mediated the effect of several standard complexity metrics (the overall number of links, words, images, graphical regions, the JPEG file size of home page screenshots) on information seeking time and usability ratings. Furthermore, it predicted the cognitive demand of web navigation, as revealed by the duration judgment ratio (i.e., the ratio of subjective to objective duration of information search). Results demonstrate that focusing on relevant links while ignoring other web objects optimizes the deployment of attentional resources necessary to navigation. This is in line with a web party effect (i.e., a cocktail party effect in the web environment): users tune into web elements that are relevant for the achievement of their navigation goals and tune out all others.
Introduction to Webometrics Quantitative Web Research for the Social Sciences
Thelwall, Michael
2009-01-01
Webometrics is concerned with measuring aspects of the web: web sites, web pages, parts of web pages, words in web pages, hyperlinks, web search engine results. The importance of the web itself as a communication medium and for hosting an increasingly wide array of documents, from journal articles to holiday brochures, needs no introduction. Given this huge and easily accessible source of information, there are limitless possibilities for measuring or counting on a huge scale (e.g., the number of web sites, the number of web pages, the number of blogs) or on a smaller scale (e.g., the number o
Choi, Okkyung; Han, SangYong
2007-01-01
Ubiquitous Computing makes it possible to determine in real time the location and situations of service requesters in a web service environment as it enables access to computers at any time and in any place. Though research on various aspects of ubiquitous commerce is progressing at enterprises and research centers, both domestically and overseas, analysis of a customer's personal preferences based on semantic web and rule based services using semantics is not currently being conducted. This paper proposes a Ubiquitous Computing Services System that enables a rule based search as well as semantics based search to support the fact that the electronic space and the physical space can be combined into one and the real time search for web services and the construction of efficient web services thus become possible.
Start Your Engines: Surfing with Search Engines for Kids.
Byerly, Greg; Brodie, Carolyn S.
1999-01-01
Suggests that to be an effective educator and user of the Web it is essential to know the basics about search engines. Presents tips for using search engines. Describes several search engines for children and young adults, as well as some general filtered search engines for children. (AEF)
Quality of Web-Based Information on Cannabis Addiction
Khazaal, Yasser; Chatton, Anne; Cochand, Sophie; Zullino, Daniele
2008-01-01
This study evaluated the quality of Web-based information on cannabis use and addiction and investigated particular content quality indicators. Three keywords ("cannabis addiction," "cannabis dependence," and "cannabis abuse") were entered into two popular World Wide Web search engines. Websites were assessed with a standardized proforma designed…
TECHNIQUES USED IN SEARCH ENGINE MARKETING
Assoc. Prof. Liviu Ion Ciora Ph. D; Lect. Ion Buligiu Ph. D
2010-01-01
Search engine marketing (SEM) is a generic term covering a variety of marketing techniques intended for attracting web traffic in search engines and directories. SEM is a popular tool since it has the potential of substantial gains with minimum investment. On the one side, most search engines and directories offer free or extremely cheap listing. On the other side, the traffic coming from search engines and directories tends to be motivated for acquisitions, making these visitors some of the ...
Web page sorting algorithm based on query keyword distance relation
Yang, Han; Cui, Hong Gang; Tang, Hao
2017-08-01
In order to optimize the problem of page sorting, according to the search keywords in the web page in the relationship between the characteristics of the proposed query keywords clustering ideas. And it is converted into the degree of aggregation of the search keywords in the web page. Based on the PageRank algorithm, the clustering degree factor of the query keyword is added to make it possible to participate in the quantitative calculation. This paper proposes an improved algorithm for PageRank based on the distance relation between search keywords. The experimental results show the feasibility and effectiveness of the method.
Evaluating company growth potential using AI and web media data
DEFF Research Database (Denmark)
Droll, Andrew; Khan, Shahzad; Tanev, Stoyan
2017-01-01
The article focuses on adapting and validating the use of an existing web search and analytics engine to evaluate the growth and competitive potential of new technology start-ups and existing firms in the newly emerging precision medicine sector. The results are based on two different search...... includes new technology firms in the same sector. The firms in the second sample were used as test cases in examining if their growth related web search scores would relate to the degree of their innovativeness. The second part of the study applied the same methodology to the real time monitoring of firms...
Can keyword length indicate Web Users' readiness to purchase
Ramlall, Shalini; Sanders, David; Tewkesbury, Giles; Ndzi, David
2011-01-01
Over the last ten years, the internet has become an important marketing tool and a profitable selling channel. The biggest challenge for most online business is converting Web users into customers effectively and at a high rate. Understanding the audience of a website is essential for achieving high conversion rates. This paper describes the research carried out in online search behaviour. The research looks at whether the length of a Web user’s search keyword can provide insight into their i...
Ramakrishnan: Semantics on the Web
National Aeronautics and Space Administration — It is becoming increasingly clear that the next generation of web search and advertising will rely on a deeper understanding of user intent and task modeling, and a...
An evaluation of the quality of Turkish community pharmacy web sites concerning HON principles.
Yegenoglu, Selen; Sozen, Bilge; Aslan, Dilek; Calgan, Zeynep; Cagirci, Simge
2008-05-01
The objective of this study was to find all the existing Web sites of Turkish community pharmacies and evaluate their "quality" in terms of Health on the Net (HON) Code of conduct principles. Multiple Internet search engines were used (google.com, yahoo.com, altavista.com, msn.com). While searching on the Internet, "eczane (pharmacy)" and "eczanesi (pharmacy of)" key words were used. The Internet search lasted for 2 months starting from March 1, 2007 until May 1, 2007. SPSS ver. 11.5 statistical program (SPSS, Inc., Chicago, IL) was used for data entry and analysis. At the end of the Internet search via all the indicated search engines, a total of 203 (all different from each other) community pharmacy Web sites were determined; of these, 14 were under construction and 6 were not accessible. As a result, 183 community pharmacy Web sites were included in the study. All of the Web sites could be accessed (100%). However, the availability of some characteristics of the pharmacies were quite poor. None of the pharmacies met all of the HON principles. Only 11 Web sites were appropriate in terms of complementarity (6.0%). Confidentiality criteria was met by only 14 pharmacies (7.7%). Nine pharmacies (4.9%) completed the "attribution" criteria. Among 183 pharmacy Web sites, the most met HON principle was the "transparency of authorship" (69 pharmacy Web sites; 37.7%). Because of the results of our study, the Turkish Pharmacists Association can take a pioneer role to apply some principles such as HON code of conduct in order to increase the quality of Turkish community pharmacists' Web sites.
Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses.
Falagas, Matthew E; Pitsouni, Eleni I; Malietzis, George A; Pappas, Georgios
2008-02-01
The evolution of the electronic age has led to the development of numerous medical databases on the World Wide Web, offering search facilities on a particular subject and the ability to perform citation analysis. We compared the content coverage and practical utility of PubMed, Scopus, Web of Science, and Google Scholar. The official Web pages of the databases were used to extract information on the range of journals covered, search facilities and restrictions, and update frequency. We used the example of a keyword search to evaluate the usefulness of these databases in biomedical information retrieval and a specific published article to evaluate their utility in performing citation analysis. All databases were practical in use and offered numerous search facilities. PubMed and Google Scholar are accessed for free. The keyword search with PubMed offers optimal update frequency and includes online early articles; other databases can rate articles by number of citations, as an index of importance. For citation analysis, Scopus offers about 20% more coverage than Web of Science, whereas Google Scholar offers results of inconsistent accuracy. PubMed remains an optimal tool in biomedical electronic research. Scopus covers a wider journal range, of help both in keyword searching and citation analysis, but it is currently limited to recent articles (published after 1995) compared with Web of Science. Google Scholar, as for the Web in general, can help in the retrieval of even the most obscure information but its use is marred by inadequate, less often updated, citation information.
Analysis of Web Spam for Non-English Content: Toward More Effective Language-Based Classifiers.
Directory of Open Access Journals (Sweden)
Mansour Alsaleh
Full Text Available Web spammers aim to obtain higher ranks for their web pages by including spam contents that deceive search engines in order to include their pages in search results even when they are not related to the search terms. Search engines continue to develop new web spam detection mechanisms, but spammers also aim to improve their tools to evade detection. In this study, we first explore the effect of the page language on spam detection features and we demonstrate how the best set of detection features varies according to the page language. We also study the performance of Google Penguin, a newly developed anti-web spamming technique for their search engine. Using spam pages in Arabic as a case study, we show that unlike similar English pages, Google anti-spamming techniques are ineffective against a high proportion of Arabic spam pages. We then explore multiple detection features for spam pages to identify an appropriate set of features that yields a high detection accuracy compared with the integrated Google Penguin technique. In order to build and evaluate our classifier, as well as to help researchers to conduct consistent measurement studies, we collected and manually labeled a corpus of Arabic web pages, including both benign and spam pages. Furthermore, we developed a browser plug-in that utilizes our classifier to warn users about spam pages after clicking on a URL and by filtering out search engine results. Using Google Penguin as a benchmark, we provide an illustrative example to show that language-based web spam classifiers are more effective for capturing spam contents.
Variability of patient spine education by Internet search engine.
Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S
2014-03-01
Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.
GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms.
Moraes, João P A; Pappa, Gisele L; Pires, Douglas E V; Izidoro, Sandro C
2017-07-03
Enzyme active sites are important and conserved functional regions of proteins whose identification can be an invaluable step toward protein function prediction. Most of the existing methods for this task are based on active site similarity and present limitations including performing only exact matches on template residues, template size restraints, despite not being capable of finding inter-domain active sites. To fill this gap, we proposed GASS-WEB, a user-friendly web server that uses GASS (Genetic Active Site Search), a method based on an evolutionary algorithm to search for similar active sites in proteins. GASS-WEB can be used under two different scenarios: (i) given a protein of interest, to match a set of specific active site templates; or (ii) given an active site template, looking for it in a database of protein structures. The method has shown to be very effective on a range of experiments and was able to correctly identify >90% of the catalogued active sites from the Catalytic Site Atlas. It also managed to achieve a Matthew correlation coefficient of 0.63 using the Critical Assessment of protein Structure Prediction (CASP 10) dataset. In our analysis, GASS was ranking fourth among 18 methods. GASS-WEB is freely available at http://gass.unifei.edu.br/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
EquiX-A Search and Query Language for XML.
Cohen, Sara; Kanza, Yaron; Kogan, Yakov; Sagiv, Yehoshua; Nutt, Werner; Serebrenik, Alexander
2002-01-01
Describes EquiX, a search language for XML that combines querying with searching to query the data and the meta-data content of Web pages. Topics include search engines; a data model for XML documents; search query syntax; search query semantics; an algorithm for evaluating a query on a document; and indexing EquiX queries. (LRW)
WebQuest y anotaciones semánticas WebQuest and semantic annotations
Directory of Open Access Journals (Sweden)
Santiago Blanco Suárez
2007-03-01
Full Text Available En este artículo se presenta un sistema de búsqueda y recuperación de metadatos de actividades educativas que siguen el modelo WebQuest. Se trata de una base de datos relacional, accesible a través del web, que se complementa con un módulo que permite realizar anotaciones semánticas y cuyo objetivo es capturar y enriquecer el conocimiento acerca del uso de dichos ejercicios por parte de la comunidad de docentes que experimentan con ellos, así como documentar los recursos o sitios web de interés didáctico buscando construir un repositorio de enlaces educativos de calidad. This paper presents a system of searching and recovering educational activities that follow the Web-Quest model through the web, complemented with a module to make semantic annotations aimed at getting and enriching the knowledge on the use of these exercises by the teaching community. It also tries to document the resources or websites with didactic interest in order to build a qualified account of educational links.
Comparing image search behaviour in the ARRS GoldMiner search engine and a clinical PACS/RIS.
De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning
2015-08-01
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often
The Number of Scholarly Documents on the Public Web
Khabsa, Madian; Giles, C. Lee
2014-01-01
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%. PMID:24817403
The number of scholarly documents on the public web.
Directory of Open Access Journals (Sweden)
Madian Khabsa
Full Text Available The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24% are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
The number of scholarly documents on the public web.
Khabsa, Madian; Giles, C Lee
2014-01-01
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
University of Glasgow at WebCLEF 2005
DEFF Research Database (Denmark)
Macdonald, C.; Plachouras, V.; He, B.
2006-01-01
We participated in the WebCLEF 2005 monolingual task. In this task, a search system aims to retrieve relevant documents from a multilingual corpus of Web documents from Web sites of European governments. Both the documents and the queries are written in a wide range of European languages......, namely content, title, and anchor text of incoming hyperlinks. We use a technique called per-field normalisation, which extends the Divergence From Randomness (DFR) framework, to normalise the term frequencies, and to combine them across the three fields. We also employ the length of the URL path of Web...
Frontiers in ICT towards web 3.0
Levnajic, Zoran
2014-01-01
Life without the World Wide Web has become unthinkable, much like life without electricity or water supply. We rely on the web to check public transport schedules, buy a ticket for a concert or exchange photos with friends. However, many everyday tasks cannot be accomplished by the computer itself, since the websites are designed to be read by people, not machines. In addition, the online information is often unstructured and poorly organized, leaving the user with tedious work of searching and filtering. This book takes us to the frontiers of the emerging Web 3.0 or Semantic Web - a new gener
Selecting a Free Web-Hosted Survey Tool for Student Use
Elbeck, Matt
2014-01-01
This study provides marketing educators a review of free web-based survey services and guidance for student use. A mixed methods approach started with online searches and metrics identifying 13 free web-hosted survey services, described as demonstration or project tools, and ranked using popularity and importance web-based metrics. For each…
Traitor: associating concepts using the world wide web
Drijfhout, Wanno; Oliver, J.; Oliver, Jundt; Wevers, L.; Hiemstra, Djoerd
We use Common Crawl's 25TB data set of web pages to construct a database of associated concepts using Hadoop. The database can be queried through a web application with two query interfaces. A textual interface allows searching for similarities and differences between multiple concepts using a query
Semantic Web Without SPARQL.pdf
Szekely, Pedro
2016-01-01
Discuss the creation of large Semantic Web applications with billions of triples. Instead of using a traditional SPARQL endpoint, our toolchain is a pure JSON toolchain using JSON-LD and ElasticSearch to support queries. The toolchain is familiar to all developers, does not require knowledge of Semantic Web technologies, and performance is 10X better than using SPARQL endpoints. The presentation illustrates the approach in the context of an application to fight human trafficking, using data f...
Dodge, Timothy
1998-01-01
Evaluates 15 criminal justice Web sites that have been selected according to the following criteria: authority, currency, purpose, objectivity, and potential usefulness to researchers. The sites provide narrative and statistical information concerning crime, law enforcement, the judicial system, and corrections. Searching techniques are also…
The Invisible Web: Uncovering Information Sources Search Engines Can't See.
Sherman, Chris; Price, Gary
This book takes a detailed look at the nature and extent of the Invisible Web, and offers pathfinders for accessing the valuable information it contains. It is designed to fit the needs of both novice and advanced Web searchers. Chapter One traces the development of the Internet and many of the early tools used to locate and share information via…
EVALUATION OF WEB SEARCHING METHOD USING A NOVEL WPRR ALGORITHM FOR TWO DIFFERENT CASE STUDIES
Directory of Open Access Journals (Sweden)
V. Lakshmi Praba
2012-04-01
Full Text Available The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to web data and documents. Web content mining and web structure mining have important roles in identifying the relevant web page. Relevancy of web page denotes how well a retrieved web page or set of web pages meets the information need of the user. Page Rank, Weighted Page Rank and Hypertext Induced Topic Selection (HITS are existing algorithms which considers only web structure mining. Vector Space Model (VSM, Cover Density Ranking (CDR, Okapi similarity measurement (Okapi and Three-Level Scoring method (TLS are some of existing relevancy score methods which consider only web content mining. In this paper, we propose a new algorithm, Weighted Page with Relevant Rank (WPRR which is blend of both web content mining and web structure mining that demonstrates the relevancy of the page with respect to given query for two different case scenarios. It is shown that WPRR’s performance is better than the existing algorithms.
Personal health records: retrieving contextual information with Google Custom Search.
Ahsan, Mahmud; Seldon, H Lee; Sayeed, Shohel
2012-01-01
Ubiquitous personal health records, which can accompany a person everywhere, are a necessary requirement for ubiquitous healthcare. Contextual information related to health events is important for the diagnosis and treatment of disease and for the maintenance of good health, yet it is seldom recorded in a health record. We describe a dual cellphone-and-Web-based personal health record system which can include 'external' contextual information. Much contextual information is available on the Internet and we can use ontologies to help identify relevant sites and information. But a search engine is required to retrieve information from the Web and developing a customized search engine is beyond our scope, so we can use Google Custom Search API Web service to get contextual data. In this paper we describe a framework which combines a health-and-environment 'knowledge base' or ontology with the Google Custom Search API to retrieve relevant contextual information related to entries in a ubiquitous personal health record.
Distributed Web Service Repository
Directory of Open Access Journals (Sweden)
Piotr Nawrocki
2015-01-01
Full Text Available The increasing availability and popularity of computer systems has resulted in a demand for new, language- and platform-independent ways of data exchange. That demand has in turn led to a significant growth in the importance of systems based on Web services. Alongside the growing number of systems accessible via Web services came the need for specialized data repositories that could offer effective means of searching of available services. The development of mobile systems and wireless data transmission technologies has allowed the use of distributed devices and computer systems on a greater scale. The accelerating growth of distributed systems might be a good reason to consider the development of distributed Web service repositories with built-in mechanisms for data migration and synchronization.
Nguyen, Sonia Kim Anh; Ingledew, Paris-Ann
2013-12-01
This study describes Internet use by breast cancer patients highlighting search patterns and examining the impact of web-based information on the clinical encounter. From September 2011 to January 2012, breast cancer patients at a cancer center completed a survey. Answers were closed and open-ended. Eighty-one patients were approached and 56 completed the survey. Forty-five (80 %) respondents used the Internet and 32 (71 %) searched for breast cancer information. All used Google as their principal search engine. To evaluate quality, 47 % referred to author credentials and 41 % examined references. Most sought information with respect to treatment or prognosis. Eighty percent felt that the information increased their knowledge and influenced treatment decision making for 53 %. This study highlights search patterns and factors used by breast cancer patients in seeking web-based information. Physicians must appreciate that patients use the Internet and address discrepancies between information sought and that which is available.
What Major Search Engines Like Google, Yahoo and Bing Need to Know about Teachers in the UK?
Seyedarabi, Faezeh
2014-01-01
This article briefly outlines the current major search engines' approach to teachers' web searching. The aim of this article is to make Web searching easier for teachers when searching for relevant online teaching materials, in general, and UK teacher practitioners at primary, secondary and post-compulsory levels, in particular. Therefore, major…
Classification of Automated Search Traffic
Buehrer, Greg; Stokes, Jack W.; Chellapilla, Kumar; Platt, John C.
As web search providers seek to improve both relevance and response times, they are challenged by the ever-increasing tax of automated search query traffic. Third party systems interact with search engines for a variety of reasons, such as monitoring a web site’s rank, augmenting online games, or possibly to maliciously alter click-through rates. In this paper, we investigate automated traffic (sometimes referred to as bot traffic) in the query stream of a large search engine provider. We define automated traffic as any search query not generated by a human in real time. We first provide examples of different categories of query logs generated by automated means. We then develop many different features that distinguish between queries generated by people searching for information, and those generated by automated processes. We categorize these features into two classes, either an interpretation of the physical model of human interactions, or as behavioral patterns of automated interactions. Using the these detection features, we next classify the query stream using multiple binary classifiers. In addition, a multiclass classifier is then developed to identify subclasses of both normal and automated traffic. An active learning algorithm is used to suggest which user sessions to label to improve the accuracy of the multiclass classifier, while also seeking to discover new classes of automated traffic. Performance analysis are then provided. Finally, the multiclass classifier is used to predict the subclass distribution for the search query stream.
Web party effect: a cocktail party effect in the web environment
Directory of Open Access Journals (Sweden)
Sara Rigutti
2015-03-01
Full Text Available In goal-directed web navigation, labels compete for selection: this process often involves knowledge integration and requires selective attention to manage the dizziness of web layouts. Here we ask whether the competition for selection depends on all web navigation options or only on those options that are more likely to be useful for information seeking, and provide evidence in favor of the latter alternative. Participants in our experiment navigated a representative set of real websites of variable complexity, in order to reach an information goal located two clicks away from the starting home page. The time needed to reach the goal was accounted for by a novel measure of home page complexity based on a part of (not all web options: the number of links embedded within web navigation elements weighted by the number and type of embedding elements. Our measure fully mediated the effect of several standard complexity metrics (the overall number of links, words, images, graphical regions, the JPEG file size of home page screenshots on information seeking time and usability ratings. Furthermore, it predicted the cognitive demand of web navigation, as revealed by the duration judgment ratio (i.e., the ratio of subjective to objective duration of information search. Results demonstrate that focusing on relevant links while ignoring other web objects optimizes the deployment of attentional resources necessary to navigation. This is in line with a web party effect (i.e., a cocktail party effect in the web environment: users tune into web elements that are relevant for the achievement of their navigation goals and tune out all others.
Web party effect: a cocktail party effect in the web environment
Gerbino, Walter
2015-01-01
In goal-directed web navigation, labels compete for selection: this process often involves knowledge integration and requires selective attention to manage the dizziness of web layouts. Here we ask whether the competition for selection depends on all web navigation options or only on those options that are more likely to be useful for information seeking, and provide evidence in favor of the latter alternative. Participants in our experiment navigated a representative set of real websites of variable complexity, in order to reach an information goal located two clicks away from the starting home page. The time needed to reach the goal was accounted for by a novel measure of home page complexity based on a part of (not all) web options: the number of links embedded within web navigation elements weighted by the number and type of embedding elements. Our measure fully mediated the effect of several standard complexity metrics (the overall number of links, words, images, graphical regions, the JPEG file size of home page screenshots) on information seeking time and usability ratings. Furthermore, it predicted the cognitive demand of web navigation, as revealed by the duration judgment ratio (i.e., the ratio of subjective to objective duration of information search). Results demonstrate that focusing on relevant links while ignoring other web objects optimizes the deployment of attentional resources necessary to navigation. This is in line with a web party effect (i.e., a cocktail party effect in the web environment): users tune into web elements that are relevant for the achievement of their navigation goals and tune out all others. PMID:25802803
Knowing How Good Our Searches Are: An Approach Derived from Search Filter Development Methodology
Directory of Open Access Journals (Sweden)
Sarah Hayman
2015-12-01
Full Text Available Objective – Effective literature searching is of paramount importance in supporting evidence based practice, research, and policy. Missed references can have adverse effects on outcomes. This paper reports on the development and evaluation of an online learning resource, designed for librarians and other interested searchers, presenting an evidence based approach to enhancing and testing literature searches. Methods – We developed and evaluated the set of free online learning modules for librarians called Smart Searching, suggesting the use of techniques derived from search filter development undertaken by the CareSearch Palliative Care Knowledge Network and its associated project Flinders Filters. The searching module content has been informed by the processes and principles used in search filter development. The self-paced modules are intended to help librarians and other interested searchers test the effectiveness of their literature searches, provide evidence of search performance that can be used to improve searches, as well as to evaluate and promote searching expertise. Each module covers one of four techniques, or core principles, employed in search filter development: (1 collaboration with subject experts; (2 use of a reference sample set; (3 term identification through frequency analysis; and (4 iterative testing. Evaluation of the resource comprised ongoing monitoring of web analytics to determine factors such as numbers of users and geographic origin; a user survey conducted online elicited qualitative information about the usefulness of the resource. Results – The resource was launched in May 2014. Web analytics show over 6,000 unique users from 101 countries (at 9 August 2015. Responses to the survey (n=50 indicated that 80% would recommend the resource to a colleague. Conclusions – An evidence based approach to searching, derived from search filter development methodology, has been shown to have value as an online learning
'Sciencenet'--towards a global search and share engine for all scientific knowledge.
Lütjohann, Dominic S; Shah, Asmi H; Christen, Michael P; Richter, Florian; Knese, Karsten; Liebel, Urban
2011-06-15
Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, 'Sciencenet', which facilitates rapid searching over this large data space. By 'bringing the search engine to the data', we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the 'AskMe' experiment publisher is written in Python 2.7, and the backend 'YaCy' search engine is based on Java 1.6.
Quality of Web-based information on obsessive compulsive disorder.
Klila, Hedi; Chatton, Anne; Zermatten, Ariane; Khan, Riaz; Preisig, Martin; Khazaal, Yasser
2013-01-01
The Internet is increasingly used as a source of information for mental health issues. The burden of obsessive compulsive disorder (OCD) may lead persons with diagnosed or undiagnosed OCD, and their relatives, to search for good quality information on the Web. This study aimed to evaluate the quality of Web-based information on English-language sites dealing with OCD and to compare the quality of websites found through a general and a medically specialized search engine. Keywords related to OCD were entered into Google and OmniMedicalSearch. Websites were assessed on the basis of accountability, interactivity, readability, and content quality. The "Health on the Net" (HON) quality label and the Brief DISCERN scale score were used as possible content quality indicators. Of the 235 links identified, 53 websites were analyzed. The content quality of the OCD websites examined was relatively good. The use of a specialized search engine did not offer an advantage in finding websites with better content quality. A score ≥16 on the Brief DISCERN scale is associated with better content quality. This study shows the acceptability of the content quality of OCD websites. There is no advantage in searching for information with a specialized search engine rather than a general one. The Internet offers a number of high quality OCD websites. It remains critical, however, to have a provider-patient talk about the information found on the Web.
Conceptual Web Users' Actions Prediction for Ontology-Based Browsing Recommendations
Robal, Tarmo; Kalja, Ahto
The Internet consists of thousands of web sites with different kinds of structures. However, users are browsing the web according to their informational expectations towards the web site searched, having an implicit conceptual model of the domain in their minds. Nevertheless, people tend to repeat themselves and have partially shared conceptual views while surfing the web, finding some areas of web sites more interesting than others. Herein, we take advantage of the latter and provide a model and a study on predicting users' actions based on the web ontology concepts and their relations.
MuZeeker - Adapting a music search engine for mobile phones
DEFF Research Database (Denmark)
Larsen, Jakob Eg; Halling, Søren Christian; Sigurdsson, Magnus Kristinn
2010-01-01
We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web......-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile Mu......Zeeker application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable...
Global polar geospatial information service retrieval based on search engine and ontology reasoning
Chen, Nengcheng; E, Dongcheng; Di, Liping; Gong, Jianya; Chen, Zeqiang
2007-01-01
In order to improve the access precision of polar geospatial information service on web, a new methodology for retrieving global spatial information services based on geospatial service search and ontology reasoning is proposed, the geospatial service search is implemented to find the coarse service from web, the ontology reasoning is designed to find the refined service from the coarse service. The proposed framework includes standardized distributed geospatial web services, a geospatial service search engine, an extended UDDI registry, and a multi-protocol geospatial information service client. Some key technologies addressed include service discovery based on search engine and service ontology modeling and reasoning in the Antarctic geospatial context. Finally, an Antarctica multi protocol OWS portal prototype based on the proposed methodology is introduced.
How Will Online Affiliate Marketing Networks Impact Search Engine Rankings?
Janssen, David; Heck, Eric
2007-01-01
textabstractIn online affiliate marketing networks advertising web sites offer their affiliates revenues based on provided web site traffic and associated leads and sales. Advertising web sites can have a network of thousands of affiliates providing them with web site traffic through hyperlinks on their web sites. Search engines such as Google, MSN, and Yahoo, consider hyperlinks as a proof of quality and/or reliability of the linked web sites, and therefore use them to determine the relevanc...
Sources of Militaria on the World Wide Web | Walker | Scientia ...
African Journals Online (AJOL)
Having an interest in military-type topics is one thing, finding information on the web to quench your thirst for knowledge is another. The World Wide Web (WWW) is a universal electronic library that contains millions of web pages. As well as being fun, it is an addictive tool on which to search for information. To prevent hours ...
Increasing efficiency of information dissemination and collection through the World Wide Web
Daniel P. Huebner; Malchus B. Baker; Peter F. Ffolliott
2000-01-01
Researchers, managers, and educators have access to revolutionary technology for information transfer through the World Wide Web (Web). Using the Web to effectively gather and distribute information is addressed in this paper. Tools, tips, and strategies are discussed. Companion Web sites are provided to guide users in selecting the most appropriate tool for searching...
The Imperative Of Literature Search For Research In Nigeria | Madu ...
African Journals Online (AJOL)
The paper while advancing reasons for literature search described how the library can assist in literature search. It finally discussed the various approaches and levels of search especially on the web and the problems researchers are most likely to encounter. Keywords: Research, Literature Search, Nigeria. The Information ...
Finding people, papers, and posts: Vertical search algorithms and evaluation
Berendsen, R.W.
2015-01-01
There is a growing diversity of information access applications. While general web search has been dominant in the past few decades, a wide variety of so-called vertical search tasks and applications have come to the fore. Vertical search is an often used term for search that targets specific
Search as Learning (Dagstuhl Seminar 17092)
Collins-Thompson, Kevyn; Hansen, Preben; Hauff, Claudia
2017-01-01
This report describes the program and the results of Dagstuhl Seminar 17092 "Search as Learning", which brought together 26 researchers from diverse research backgrounds. The motivation for the seminar stems from the fact that modern Web search engines are largely engineered and optimized to fulfill lookup tasks instead of complex search tasks. The latter though are an essential component of information discovery and learning. The 3-day seminar started with four perspective talks, providing f...
Subject Gateway Sites and Search Engine Ranking.
Thelwall, Mike
2002-01-01
Discusses subject gateway sites and commercial search engines for the Web and presents an explanation of Google's PageRank algorithm. The principle question addressed is the conditions under which a gateway site will increase the likelihood that a target page is found in search engines. (LRW)
Regulating Search Engines: Taking Stock And Looking Ahead
Gasser, Urs
2006-01-01
Since the creation of the first pre-Web Internet search engines in the early 1990s, search engines have become almost as important as email as a primary online activity. Arguably, search engines are among the most important gatekeepers in today's digitally networked environment. Thus, it does not come as a surprise that the evolution of search technology and the diffusion of search engines have been accompanied by a series of conflicts among stakeholders such as search operators, content crea...
Semantic Service Discovery Techniques for the composable web
Fernández Villamor, José Ignacio
2013-01-01
This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, thr...
Directory of Open Access Journals (Sweden)
J. Prasanna Kumar
2013-02-01
Full Text Available Duplicate and near-duplicate web pages are the chief concerns for web search engines. In reality, they incur enormous space to store the indexes, ultimately slowing down and increasing the cost of serving results. A variety of techniques have been developed to identify pairs of web pages that are aldquo;similarardquo; to each other. The problem of finding near-duplicate web pages has been a subject of research in the database and web-search communities for some years. In order to identify the near duplicate web pages, we make use of sentence level features along with fingerprinting method. When a large number of web documents are in consideration for the detection of web pages, then at first, we use K-mode clustering and subsequently sentence feature and fingerprint comparison is used. Using these steps, we exactly identify the near duplicate web pages in an efficient manner. The experimentation is carried out on the web page collections and the results ensured the efficiency of the proposed approach in detecting the near duplicate web pages.
Stopping Web Plagiarists from Stealing Your Content
Goldsborough, Reid
2004-01-01
This article gives tips on how to avoid having content stolen by plagiarists. Suggestions include: using a Web search service such as Google to search for unique strings of text at the individuals site to uncover other sites with the same content; buying a infringement-detection program; or hiring a public relations firm to do the work. There are…
Graph Structure in Three National Academic Webs: Power Laws with Anomalies.
Thelwall, Mike; Wilkinson, David
2003-01-01
Explains how the Web can be modeled as a mathematical graph and analyzes the graph structures of three national university publicly indexable Web sites from Australia, New Zealand, and the United Kingdom. Topics include commercial search engines and academic Web link research; method-analysis environment and data sets; and power laws. (LRW)
A novel architecture for information retrieval system based on semantic web
Zhang, Hui
2011-12-01
Nowadays, the web has enabled an explosive growth of information sharing (there are currently over 4 billion pages covering most areas of human endeavor) so that the web has faced a new challenge of information overhead. The challenge that is now before us is not only to help people locating relevant information precisely but also to access and aggregate a variety of information from different resources automatically. Current web document are in human-oriented formats and they are suitable for the presentation, but machines cannot understand the meaning of document. To address this issue, Berners-Lee proposed a concept of semantic web. With semantic web technology, web information can be understood and processed by machine. It provides new possibilities for automatic web information processing. A main problem of semantic web information retrieval is that when these is not enough knowledge to such information retrieval system, the system will return to a large of no sense result to uses due to a huge amount of information results. In this paper, we present the architecture of information based on semantic web. In addiction, our systems employ the inference Engine to check whether the query should pose to Keyword-based Search Engine or should pose to the Semantic Search Engine.
Web Use for Symptom Appraisal of Physical Health Conditions: A Systematic Review.
Mueller, Julia; Jay, Caroline; Harper, Simon; Davies, Alan; Vega, Julio; Todd, Chris
2017-06-13
The Web has become an important information source for appraising symptoms. We need to understand the role it currently plays in help seeking and symptom evaluation to leverage its potential to support health care delivery. The aim was to systematically review the literature currently available on Web use for symptom appraisal. We searched PubMed, EMBASE, PsycINFO, ACM Digital Library, SCOPUS, and Web of Science for any empirical studies that addressed the use of the Web by lay people to evaluate symptoms for physical conditions. Articles were excluded if they did not meet minimum quality criteria. Study findings were synthesized using a thematic approach. A total of 32 studies were included. Study designs included cross-sectional surveys, qualitative studies, experimental studies, and studies involving website/search engine usage data. Approximately 35% of adults engage in Web use for symptom appraisal, but this proportion varies between 23% and 75% depending on sociodemographic and disease-related factors. Most searches were symptom-based rather than condition-based. Users viewed only the top search results and interacted more with results that mentioned serious conditions. Web use for symptom appraisal appears to impact on the decision to present to health services, communication with health professionals, and anxiety. Web use for symptom appraisal has the potential to influence the timing of help seeking for symptoms and the communication between patients and health care professionals during consultations. However, studies lack suitable comparison groups as well as follow-up of participants over time to determine whether Web use results in health care utilization and diagnosis. Future research should involve longitudinal follow-up so that we can weigh the benefits of Web use for symptom appraisal (eg, reductions in delays to diagnosis) against the disadvantages (eg, unnecessary anxiety and health care use) and relate these to health care costs. ©Julia Mueller
Visuel Communication in Web Design
DEFF Research Database (Denmark)
Thorlacius, Lisbeth
2010-01-01
Web sites are rapidly becoming the preferred media choice for information search, company presentation, shopping, entertainment, education, and social contacts. And along with the various forms of communication that the Web offers the aesthetic aspects have begun to play an increasing important...... role. However, studies in the design and the relevance of focusing on the aesthetic aspects in planning and using Web sites have only to a smaller degree been subject of theoretical reflection. For example, Miller in 2001, Thorlacius in 2001, 2002, 2005, Engholm in 2002, 2003 and Beaird in 2007 have...... to introduce a model for analysis of the visual communication in Web design figure 2. This new model is based on Roman Jakobson's communication model, which focuses on the linguistic aspects of the communication. Jakobson’s model has been expanded and adapted so that it is applicable to visual communication...
Project Lefty: More Bang for the Search Query
Varnum, Ken
2010-01-01
This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…
Visual Communication in Web Design - Analyzing Visual Communication in Web Design
Thorlacius, Lisbeth
Web sites are rapidly becoming the preferred media choice for information search, company presentation, shopping, entertainment, education, and social contacts. And along with the various forms of communication that the Web offers the aesthetic aspects have begun to play an increasingly important role. However, studies in the design and the relevance of focusing on the aesthetic aspects in planning and using Web sites have only to a smaller degree been subject of theoretical reflection. For example, Miller (2000), Thorlacius (2001, 2002, 2005), Engholm (2002, 2003), and Beaird (2007) have been contributing to set a beginning agenda that address the aesthetic aspects. On the other hand, there is a considerable amount of literature addressing the theoretical and methodological aspects focusing on the technical and functional aspects. In this context it is the aim of this article to introduce a model for analysis of visual communication on websites.
Allen, David G; Mahto, Raj V; Otondo, Robert F
2007-11-01
Recruitment theory and research show that objective characteristics, subjective considerations, and critical contact send signals to prospective applicants about the organization and available opportunities. In the generating applicants phase of recruitment, critical contact may consist largely of interactions with recruitment sources (e.g., newspaper ads, job fairs, organization Web sites); however, research has yet to fully address how all 3 types of signaling mechanisms influence early job pursuit decisions in the context of organizational recruitment Web sites. Results based on data from 814 student participants searching actual organization Web sites support and extend signaling and brand equity theories by showing that job information (directly) and organization information (indirectly) are related to intentions to pursue employment when a priori perceptions of image are controlled. A priori organization image is related to pursuit intentions when subsequent information search is controlled, but organization familiarity is not, and attitudes about a recruitment source also influence attraction and partially mediate the effects of organization information. Theoretical and practical implications for recruitment are discussed. (c) 2007 APA
Intelligent Information Systems for Web Product Search
D. Vandic (Damir)
2017-01-01
markdownabstractOver the last few years, we have experienced an increase in online shopping. Consequently, there is a need for efficient and effective product search engines. The rapid growth of e-commerce, however, has also introduced some challenges. Studies show that users can get overwhelmed by
TX-Kw: An Effective Temporal XML Keyword Search
Rasha Bin-Thalab; Neamat El-Tazi; Mohamed E.El-Sharkawi
2013-01-01
Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. Existing methods cannot resolve challenges addressed by using keyword search in Temporal XML documents. We propose a way to evaluate temporal keyword search queries over Temporal XML documents. Moreover, we propose a new ranking method based on the time-aware IR ranking methods to rank temporal keyword search queries results. Extensive experiments have been ...
A Web-Based Learning System for Software Test Professionals
Wang, Minhong; Jia, Haiyang; Sugumaran, V.; Ran, Weijia; Liao, Jian
2011-01-01
Fierce competition, globalization, and technology innovation have forced software companies to search for new ways to improve competitive advantage. Web-based learning is increasingly being used by software companies as an emergent approach for enhancing the skills of knowledge workers. However, the current practice of Web-based learning is…
Quality Search Content: A Reality With Next Generation Browsers
Digital Repository Service at National Institute of Oceanography (India)
Lakshminarayana, S.
Internet became destiny to get information or to transact a business need. Most of the works including the recent articles demands for quality search content from the Web. The Interactions in the Internet are performed through a web browser. A...
Building maps to search the web: the method Sewcom
Directory of Open Access Journals (Sweden)
Corrado Petrucco
2002-01-01
Full Text Available Seeking information on the Internet is becoming a necessity 'at school, at work and in every social sphere. Unfortunately the difficulties' inherent in the use of search engines and the use of unconscious cognitive approaches inefficient limit their effectiveness. It is in this respect presented a method, called SEWCOM that lets you create conceptual maps through interaction with search engines.
Bohne-Lang, Andreas; Lang, Elke; Taube, Anke
2005-06-27
Web-based searching is the accepted contemporary mode of retrieving relevant literature, and retrieving as many full text articles as possible is a typical prerequisite for research success. In most cases only a proportion of references will be directly accessible as digital reprints through displayed links. A large number of references, however, have to be verified in library catalogues and, depending on their availability, are accessible as print holdings or by interlibrary loan request. The problem of verifying local print holdings from an initial retrieval set of citations can be solved using Z39.50, an ANSI protocol for interactively querying library information systems. Numerous systems include Z39.50 interfaces and therefore can process Z39.50 interactive requests. However, the programmed query interaction command structure is non-intuitive and inaccessible to the average biomedical researcher. For the typical user, it is necessary to implement the protocol within a tool that hides and handles Z39.50 syntax, presenting a comfortable user interface. PMD2HD is a web tool implementing Z39.50 to provide an appropriately functional and usable interface to integrate into the typical workflow that follows an initial PubMed literature search, providing users with an immediate asset to assist in the most tedious step in literature retrieval, checking for subscription holdings against a local online catalogue. PMD2HD can facilitate literature access considerably with respect to the time and cost of manual comparisons of search results with local catalogue holdings. The example presented in this article is related to the library system and collections of the German Cancer Research Centre. However, the PMD2HD software architecture and use of common Z39.50 protocol commands allow for transfer to a broad range of scientific libraries using Z39.50-compatible library information systems.
Fabricant, Peter D; Dy, Christopher J; Patel, Ronak M; Blanco, John S; Doyle, Shevaun M
2013-06-01
The recent emphasis on shared decision-making has increased the role of the Internet as a readily accessible medical reference source for patients and families. However, the lack of professional review creates concern over the quality, accuracy, and readability of medical information available to patients on the Internet. Three Internet search engines (Google, Yahoo, and Bing) were evaluated prospectively using 3 difference search terms of varying sophistication ("congenital hip dislocation," "developmental dysplasia of the hip," and "hip dysplasia in children"). Sixty-three unique Web sites were evaluated by each of 3 surgeons (2 fellowship-trained pediatric orthopaedic attendings and 1 orthopaedic chief resident) for quality and accuracy using a set of scoring criteria based on the AAOS/POSNA patient education Web site. The readability (literacy grade level) of each Web site was assessed using the Fleisch-Kincaid score. There were significant differences noted in quality, accuracy, and readability of information depending on the search term used. The search term "developmental dysplasia of the hip" provided higher quality and accuracy compared with the search term "congenital hip dislocation." Of the 63 total Web sites, 1 (1.6%) was below the sixth grade reading level recommended by the NIH for health education materials and 8 (12.7%) Web sites were below the average American reading level (eighth grade). The quality and accuracy of information available on the Internet regarding developmental hip dysplasia significantly varied with the search term used. Patients seeking information about DDH on the Internet may not understand the materials found because nearly all of the Web sites are written at a level above that recommended for publically distributed health information. Physicians should advise their patients to search for information using the term "developmental dysplasia of the hip" or, better yet, should refer patients to Web sites that they have
Implementasi Seo Web Design Methodology Pada Official Homepage Pondok Pesantren Qodratullah
Ependi, Usman
2013-01-01
Homepage or website for an organization is a way to deliver information to the public. Now the number of homepage or website of the day is always increasing both personal or owned by the organization. To communicate or disseminate information homepage/ website Islamic Boarding School of Qodratullah need a surefire way to use the Search Engine Optimization Web Design Methodology. Conducted with the implementation of the Search Engine Optimization Web Design Methodology on the homepage/ website...
Second Workshop on Supporting Complex Search Tasks
Belkin, Nicholas J.; Bogers, Toine; Kamps, Jaap; Kelly, Diane; Koolen, Marijn; Yilmaz, Emine
2017-01-01
There is broad consensus in the field of IR that search is complex in many use cases and applications, both on the Web and in domain specific collections, and both professionally and in our daily life. Yet our understanding of complex search tasks, in comparison to simple look up tasks, is
Search features of digital libraries
Directory of Open Access Journals (Sweden)
Alastair G. Smith
2000-01-01
Full Text Available Traditional on-line search services such as Dialog, DataStar and Lexis provide a wide range of search features (boolean and proximity operators, truncation, etc. This paper discusses the use of these features for effective searching, and argues that these features are required, regardless of advances in search engine technology. The literature on on-line searching is reviewed, identifying features that searchers find desirable for effective searching. A selective survey of current digital libraries available on the Web was undertaken, identifying which search features are present. The survey indicates that current digital libraries do not implement a wide range of search features. For instance: under half of the examples included controlled vocabulary, under half had proximity searching, only one enabled browsing of term indexes, and none of the digital libraries enable searchers to refine an initial search. Suggestions are made for enhancing the search effectiveness of digital libraries, for instance by: providing a full range of search operators, enabling browsing of search terms, enhancement of records with controlled vocabulary, enabling the refining of initial searches, etc.
Hofmeister, Erik H; Watson, Victoria; Snyder, Lindsey B C; Love, Emma J
2008-12-15
To determine the validity of the information on the World Wide Web concerning veterinary anesthesia in dogs and to determine the methods dog owners use to obtain that information. Web-based search and client survey. 73 Web sites and 92 clients. Web sites were scored on a 5-point scale for completeness and accuracy of information about veterinary anesthesia by 3 board-certified anesthesiologists. A search for anesthetic information regarding 49 specific breeds of dogs was also performed. A survey was distributed to the clients who visited the University of Georgia Veterinary Teaching Hospital during a 4-month period to solicit data about sources used by clients to obtain veterinary medical information and the manner in which information obtained from Web sites was used. The general search identified 73 Web sites that included information on veterinary anesthesia; these sites received a mean score of 3.4 for accuracy and 2.5 for completeness. Of 178 Web sites identified through the breed-specific search, 57 (32%) indicated that a particular breed was sensitive to anesthesia. Of 83 usable, completed surveys, 72 (87%) indicated the client used the Web for veterinary medical information. Fifteen clients (18%) indicated they believed their animal was sensitive to anesthesia because of its breed. Information available on the internet regarding anesthesia in dogs is generally not complete and may be misleading with respect to risks to specific breeds. Consequently, veterinarians should appropriately educate clients regarding anesthetic risk to their particular dog.
Usability Testing Of Web Mapping Portals
Directory of Open Access Journals (Sweden)
Petr Voldán
2011-05-01
Full Text Available This study presents a usability testing as method, which can be used to improve controlling of web map sites. Study refers to the basic principles of this method and describes particular usability tests of mapping sites. In this paper are identified potential usability problems of web sites: Amapy.cz, Google maps and Mapy.cz. The usability testing was focused on problems related with user interfaces, addresses searching and route planning of the map sites.
Network dynamics: The World Wide Web
Adamic, Lada Ariana
Despite its rapidly growing and dynamic nature, the Web displays a number of strong regularities which can be understood by drawing on methods of statistical physics. This thesis finds power-law distributions in website sizes, traffic, and links, and more importantly, develops a stochastic theory which explains them. Power-law link distributions are shown to lead to network characteristics which are especially suitable for scalable localized search. It is also demonstrated that the Web is a "small world": to reach one site from any other takes an average of only 4 hops, while most related sites cluster together. Additional dynamical properties of the Web graph are extracted from diffusion processes.
Teaching with technology: automatically receiving information from the internet and web.
Wink, Diane M
2010-01-01
In this bimonthly series, the author examines how nurse educators can use the Internet and Web-based computer technologies such as search, communication, and collaborative writing tools, social networking and social bookmarking sites, virtual worlds, and Web-based teaching and learning programs. This article presents information and tools related to automatically receiving information from the Internet and Web.
SiteGuide: An example-based approach to web site development assistance
Hollink, V.; de Boer, V.; van Someren, M.; Filipe, J.; Cordeiro, J.
2009-01-01
We present ‘SiteGuide’, a tool that helps web designers to decide which information will be included in a new web site and how the information will be organized. SiteGuide takes as input URLs of web sites from the same domain as the site the user wants to create. It automatically searches the pages
Aguillo, I
2000-01-01
Although the Internet is already a valuable information resource in medicine, there are important challenges to be faced before physicians and general users will have extensive access to this information. As a result of a research effort to compile a health-related Internet directory, new tools and strategies have been developed to solve key problems derived from the explosive growth of medical information on the Net and the great concern over the quality of such critical information. The current Internet search engines lack some important capabilities. We suggest using second generation tools (client-side based) able to deal with large quantities of data and to increase the usability of the records recovered. We tested the capabilities of these programs to solve health-related information problems, recognising six groups according to the kind of topics addressed: Z39.50 clients, downloaders, multisearchers, tracing agents, indexers and mappers. The evaluation of the quality of health information available on the Internet could require a large amount of human effort. A possible solution may be to use quantitative indicators based on the hypertext visibility of the Web sites. The cybermetric measures are valid for quality evaluation if they are derived from indirect peer review by experts with Web pages citing the site. The hypertext links acting as citations need to be extracted from a controlled sample of quality super-sites.
Web Enabled DROLS Verity TopicSets
National Research Council Canada - National Science Library
Tong, Richard
1999-01-01
The focus of this effort has been the design and development of automatically generated TopicSets and HTML pages that provide the basis of the required search and browsing capability for DTIC's Web Enabled DROLS System...
Web analytics as tool for improvement of website taxonomies
DEFF Research Database (Denmark)
Jonasen, Tanja Svarre; Ådland, Marit Kristine; Lykke, Marianne
The poster examines how web analytics can be used to provide information about users and inform design and redesign of taxonomies. It uses a case study of the website Cancer.dk by the Danish Cancer Society. The society is a private organization with an overall goal to prevent the development...... provides information about e.g. subjects of interest, searching behaviour, browsing patterns in website structure as well as tag clouds, page views. The poster discusses benefits and challenges of the two web metrics, with a model of how to use search and tag data for the design of taxonomies, e.g. choice...
A study on the personalization methods of the web | Hajighorbani ...
African Journals Online (AJOL)
... methods of correct patterns and analyze them. Here we will discuss the basic concepts of web personalization and consider the three approaches of web personalization and we evaluated the methods belonging to each of them. Keywords: personalization, search engine, user preferences, data mining methods ...
Teaching with technology: free Web resources for teaching and learning.
Wink, Diane M; Smith-Stoner, Marilyn
2011-01-01
In this bimonthly series, the department editor examines how nurse educators can use Internet and Web-based computer technologies such as search, communication, collaborative writing tools; social networking, and social bookmarking sites; virtual worlds; and Web-based teaching and learning programs. In this article, the department editor and her coauthor describe free Web-based resources that can be used to support teaching and learning.
A critical evaluation of Web sites offering patient information on tinnitus.
LENUS (Irish Health Repository)
Kieran, Stephen M
2012-02-01
The Internet is a vast information resource for both patients and healthcare professionals. However, the quality and content often lack formal scrutiny, so we examined the quality of patient information regarding tinnitus on the Internet. Using the three most popular search engines (google.com, yahoo.com, and msn.com), we found pertinent Web sites using the search term tinnitus. Web sites\\' accountability and authorship were evaluated using previously published criteria. The quality of patient information about tinnitus was assessed using a new 10-point scale, the Tinnitus Information Value (TIV). Statistical analysis was performed using the independent sample t-test (p
Finding research information on the web: how to make the most of Google and other free search tools.
Blakeman, Karen
2013-01-01
The Internet and the World Wide Web has had a major impact on the accessibility of research information. The move towards open access and development of institutional repositories has resulted in increasing amounts of information being made available free of charge. Many of these resources are not included in conventional subscription databases and Google is not always the best way to ensure that one is picking up all relevant material on a topic. This article will look at how Google's search engine works, how to use Google more effectively for identifying research information, alternatives to Google and will review some of the specialist tools that have evolved to cope with the diverse forms of information that now exist in electronic form.
EIIS: An Educational Information Intelligent Search Engine Supported by Semantic Services
Huang, Chang-Qin; Duan, Ru-Lin; Tang, Yong; Zhu, Zhi-Ting; Yan, Yong-Jian; Guo, Yu-Qing
2011-01-01
The semantic web brings a new opportunity for efficient information organization and search. To meet the special requirements of the educational field, this paper proposes an intelligent search engine enabled by educational semantic support service, where three kinds of searches are integrated into Educational Information Intelligent Search (EIIS)…
Development of a Computerized Visual Search Test
Reid, Denise; Babani, Harsha; Jon, Eugenia
2009-01-01
Visual attention and visual search are the features of visual perception, essential for attending and scanning one's environment while engaging in daily occupations. This study describes the development of a novel web-based test of visual search. The development information including the format of the test will be described. The test was designed…
Web document clustering using hyperlink structures
Energy Technology Data Exchange (ETDEWEB)
He, Xiaofeng; Zha, Hongyuan; Ding, Chris H.Q; Simon, Horst D.
2001-05-07
With the exponential growth of information on the World Wide Web there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing. In this paper we consider document clustering methods exploring textual information hyperlink structure and co-citation relations. In particular we apply the normalized cut clustering method developed in computer vision to the task of hyperdocument clustering. We also explore some theoretical connections of the normalized-cut method to K-means method. We then experiment with normalized-cut method in the context of clustering query result sets for web search engines.
REVIEW PAPER ON THE DEEP WEB DATA EXTRACTION
Prof. V. S. Patil*1, Miss Sneha Sitafale2, Miss Priyanka Kale3, Miss Poonam Bhujbal 4 , Miss Mohini Dandge 5 .
2018-01-01
Deep web data extraction is the process of extracting a set of data records and the items that they contain from a query result page. Such structured data can be later integrated into results from other data sources and given to the user in a single, cohesive view. Domain identification is used to identify the query interfaces related to the domain from the forms obtained in the search process. The surface web contains a large amount of unfiltered information, whereas the deep web includes hi...
Exploring default mode and information flow on the web.
Oka, Mizuki; Ikegami, Takashi
2013-01-01
Social networking services (e.g., Twitter, Facebook) are now major sources of World Wide Web (called "Web") dynamics, together with Web search services (e.g., Google). These two types of Web services mutually influence each other but generate different dynamics. In this paper, we distinguish two modes of Web dynamics: the reactive mode and the default mode. It is assumed that Twitter messages (called "tweets") and Google search queries react to significant social movements and events, but they also demonstrate signs of becoming self-activated, thereby forming a baseline Web activity. We define the former as the reactive mode and the latter as the default mode of the Web. In this paper, we investigate these reactive and default modes of the Web's dynamics using transfer entropy (TE). The amount of information transferred between a time series of 1,000 frequent keywords in Twitter and the same keywords in Google queries is investigated across an 11-month time period. Study of the information flow on Google and Twitter revealed that information is generally transferred from Twitter to Google, indicating that Twitter time series have some preceding information about Google time series. We also studied the information flow among different Twitter keywords time series by taking keywords as nodes and flow directions as edges of a network. An analysis of this network revealed that frequent keywords tend to become an information source and infrequent keywords tend to become sink for other keywords. Based on these findings, we hypothesize that frequent keywords form the Web's default mode, which becomes an information source for infrequent keywords that generally form the Web's reactive mode. We also found that the Web consists of different time resolutions with respect to TE among Twitter keywords, which will be another focal point of this paper.
How Adolescents Search for and Appraise Online Health Information: A Systematic Review.
Freeman, Jaimie L; Caldwell, Patrina H Y; Bennett, Patricia A; Scott, Karen M
2018-04-01
To conduct a systematic review of the evidence concerning whether and how adolescents search for online health information and the extent to which they appraise the credibility of information they retrieve. A systematic search of online databases (MEDLINE, EMBASE, PsycINFO, ERIC) was performed. Reference lists of included papers were searched manually for additional articles. Included were studies on whether and how adolescents searched for and appraised online health information, where adolescent participants were aged 13-18 years. Thematic analysis was used to synthesize the findings. Thirty-four studies met the inclusion criteria. In line with the research questions, 2 key concepts were identified within the papers: whether and how adolescents search for online health information, and the extent to which adolescents appraise online health information. Four themes were identified regarding whether and how adolescents search for online health information: use of search engines, difficulties in selecting appropriate search strings, barriers to searching, and absence of searching. Four themes emerged concerning the extent to which adolescents appraise the credibility of online health information: evaluation based on Web site name and reputation, evaluation based on first impression of Web site, evaluation of Web site content, and absence of a sophisticated appraisal strategy. Adolescents are aware of the varying quality of online health information. Strategies used by individuals for searching and appraising online health information differ in their sophistication. It is important to develop resources to enhance search and appraisal skills and to collaborate with adolescents to ensure that such resources are appropriate for them. Copyright © 2017 Elsevier Inc. All rights reserved.
Search strategies on the Internet: general and specific.
Bottrill, Krys
2004-06-01
Some of the most up-to-date information on scientific activity is to be found on the Internet; for example, on the websites of academic and other research institutions and in databases of currently funded research studies provided on the websites of funding bodies. Such information can be valuable in suggesting new approaches and techniques that could be applicable in a Three Rs context. However, the Internet is a chaotic medium, not subject to the meticulous classification and organisation of classical information resources. At the same time, Internet search engines do not match the sophistication of search systems used by database hosts. Also, although some offer relatively advanced features, user awareness of these tends to be low. Furthermore, much of the information on the Internet is not accessible to conventional search engines, giving rise to the concept of the "Invisible Web". General strategies and techniques for Internet searching are presented, together with a comparative survey of selected search engines. The question of how the Invisible Web can be accessed is discussed, as well as how to keep up-to-date with Internet content and improve searching skills.
Quality of Web-based information on obsessive compulsive disorder
Directory of Open Access Journals (Sweden)
Klila H
2013-11-01
Full Text Available Hedi Klila,1 Anne Chatton,2 Ariane Zermatten,2 Riaz Khan,2 Martin Preisig,1,3 Yasser Khazaal2,4 1Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland; 2Department of Mental Health and Psychiatry, Geneva University Hospitals, Geneva, Switzerland; 3Lausanne University, Lausanne, Switzerland; 4Geneva University, Geneva, Switzerland Background: The Internet is increasingly used as a source of information for mental health issues. The burden of obsessive compulsive disorder (OCD may lead persons with diagnosed or undiagnosed OCD, and their relatives, to search for good quality information on the Web. This study aimed to evaluate the quality of Web-based information on English-language sites dealing with OCD and to compare the quality of websites found through a general and a medically specialized search engine. Methods: Keywords related to OCD were entered into Google and OmniMedicalSearch. Websites were assessed on the basis of accountability, interactivity, readability, and content quality. The "Health on the Net" (HON quality label and the Brief DISCERN scale score were used as possible content quality indicators. Of the 235 links identified, 53 websites were analyzed. Results: The content quality of the OCD websites examined was relatively good. The use of a specialized search engine did not offer an advantage in finding websites with better content quality. A score ≥16 on the Brief DISCERN scale is associated with better content quality. Conclusion: This study shows the acceptability of the content quality of OCD websites. There is no advantage in searching for information with a specialized search engine rather than a general one. Practical implications: The Internet offers a number of high quality OCD websites. It remains critical, however, to have a provider–patient talk about the information found on the Web. Keywords: Internet, quality indicators, anxiety disorders, OCD, search engine
Dy, Christopher J; Taylor, Samuel A; Patel, Ronak M; Kitay, Alison; Roberts, Timothy R; Daluiski, Aaron
2012-09-01
Recent emphasis on shared decision making and patient-centered research has increased the importance of patient education and health literacy. The internet is rapidly growing as a source of self-education for patients. However, concern exists over the quality, accuracy, and readability of the information. Our objective was to determine whether the quality, accuracy, and readability of information online about distal radius fractures vary with the search term. This was a prospective evaluation of 3 search engines using 3 different search terms of varying sophistication ("distal radius fracture," "wrist fracture," and "broken wrist"). We evaluated 70 unique Web sites for quality, accuracy, and readability. We used comparative statistics to determine whether the search term affected the quality, accuracy, and readability of the Web sites found. Three orthopedic surgeons independently gauged quality and accuracy of information using a set of predetermined scoring criteria. We evaluated the readability of the Web site using the Fleisch-Kincaid score for reading grade level. There were significant differences in the quality, accuracy, and readability of information found, depending on the search term. We found higher quality and accuracy resulted from the search term "distal radius fracture," particularly compared with Web sites resulting from the term "broken wrist." The reading level was higher than recommended in 65 of the 70 Web sites and was significantly higher when searching with "distal radius fracture" than "wrist fracture" or "broken wrist." There was no correlation between Web site reading level and quality or accuracy. The readability of information about distal radius fractures in most Web sites was higher than the recommended reading level for the general public. The quality and accuracy of the information found significantly varied with the sophistication of the search term used. Physicians, professional societies, and search engines should consider
Gender-specific information search behavior
Parinaz Maghferat; Wolfgang G. Stock
2010-01-01
This paper presents an empirical gender study in the context of information science. It discusses an exploratory investigation, which provides empirical data about differences of information seeking activities by female and male students. The research focus was on whether there are gender-specific differences when people perform searches with the aid of general search engines and specialized Deep Web information services. It has been observed how the participants behaved in getting informatio...
Best Practices for Building Web Data Portals
Anderson, R. A.; Drew, L.
2013-12-01
With a data archive of more than 1.5 petabytes and a key role as the NASA Distributed Active Archive Center (DAAC) for synthetic aperture radar (SAR) data, the Alaska Satellite Facility (ASF) has an imperative to develop effective Web data portals. As part of continuous enhancement and expansion of its website, ASF recently created two data portals for distribution of SAR data: one for the archiving and distribution of NASA's MEaSUREs Wetlands project and one for newly digitally processed data from NASA's 1978 Seasat satellite. These case studies informed ASF's development of the following set of best practices for developing Web data portals. 1) Maintain well-organized, quality data. This is fundamental. If data are poorly organized or contain errors, credibility is lost and the data will not be used. 2) Match data to likely data uses. 3) Identify audiences in as much detail as possible. ASF DAAC's Seasat and Wetlands portals target three groups of users: a) scientists already familiar with ASF DAAC's SAR archive and our data download tool, Vertex; b) scientists not familiar with SAR or ASF, but who can use the data for their research of oceans, sea ice, volcanoes, land deformation and other Earth sciences; c) audiences wishing to learn more about SAR and its use in Earth sciences. 4) Identify the heaviest data uses and the terms scientists search for online when trying to find data for those uses. 5) Create search engine optimized (SEO) Web content that corresponds to those searches. Because search engines do not yet search raw data, so Web data portals must include content that ties the data to its likely uses. 6) Create Web designs that best serves data users (user centered design), not for how the organization views itself or its data. Usability testing was conducted for the ASF DAAC Wetlands portal to improve the user experience. 7) Use SEO tips and techniques. The ASF DAAC Seasat portal used numerous SEO techniques, including social media, blogging
Web-Scale Search-Based Data Extraction and Integration
2011-10-17
Garage band / danceteria. Piscina adulto des coberta Sala de leitura, Spa Figure 107: An apartment feature from OLX. The expected neighborhood is...11 I lon.nin - 23 I lon.bem - E I postal.code - 6010-6080 I area.code - 0612 I licence - I I mayor - Hilde Zach I website - (nttp...km’ Elevation 574 m Coordinates ’ •’:3 i * Postal cod* 6010-6060 Area code 05i: Licence plate code i Mayor Hild# Zath Web s-ie • • . . ittf
The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search
DEFF Research Database (Denmark)
Havgaard, Jakob Hull; Lyngsø, Rune B.; Gorodkin, Jan
2005-01-01
FOLDALIGN is a Sankoff-based algorithm for making structural alignments of RNA sequences. Here, we present a web server for making pairwise alignments between two RNA sequences, using the recently updated version of FOLDALIGN. The server can be used to scan two sequences for a common structural RNA...... motif of limited size, or the entire sequences can be aligned locally or globally. The web server offers a graphical interface, which makes it simple to make alignments and manually browse the results. the web server can be accessed at http://foldalign.kvl.dk...
Modeling User Behavior and Attention in Search
Huang, Jeff
2013-01-01
In Web search, query and click log data are easy to collect but they fail to capture user behaviors that do not lead to clicks. As search engines reach the limits inherent in click data and are hungry for more data in a competitive environment, mining cursor movements, hovering, and scrolling becomes important. This dissertation investigates how…
Info.cern.ch returns to the Web
2006-01-01
First web address is reincarnated as a historical reference on the birth of the Web. Tim Berners-Lee, inventor of the Web, with one of the first Web pages on his computer. CERN invites you to take a virtual trip back in time and have a look at what the very first URL, which led to a revolution of the way we communicate and share information, was all about. The original web server, whose address was info.cern.ch, centred on information regarding the WorldWideWeb (WWW) project. Visitors could learn more about hypertext, technical details for creating one's own webpage, and even an explanation on how to search the Web for information-something 5 year-olds of today have mastered since it all started 17 years ago. Now info.cern.ch has been re-launched with a much brighter façade and a focus on the ideas that inspired this new wave of technology. The first browser created by Tim Berners-Lee, inventor of the Web, contained just about everything we see today on a web browser, including graphics, menus, layouts and...
Pedagogy for teaching and learning cooperatively on the Web: a Web-based pharmacology course.
Tse, Mimi M Y; Pun, Sandra P Y; Chan, Moon Fai
2007-02-01
The Internet is becoming a preferred place to find information. Millions of people go online in the search of health and medical information. Likewise, the demand for Web-based courses grows. This article presents the development, utilization and evaluation of a web-based pharmacology course for nursing students. The course was developed based on 150 commonly used drugs. There were 110 year 1 nursing students took part in the course. After attending six hours face to face lecture of pharmacology over three weeks, students were invited to complete a questionnaire (pre-test) about learning pharmacology. The course materials were then uploaded to a WebCT for student's self-directed learning and attempts to pass two scheduled online quizzes. At the end of the semester, students were given the same questionnaire (post-test). There were a significant increase in the understanding compared with memorizing the subject content, the development of problem solving ability in learning pharmacology and becoming an independent learner (p ,0.05). Online quizzes yielded satisfactory results. In the focused group interview, students appreciated the time flexibility and convenience associated with web-based learning, also, they had made good suggestions in enhancing web-based learning. Web-based approach is promising for teaching and learning pharmacology for nurses and other health-care professionals.
WCSTools 3.0: More Tools for Image Astrometry and Catalog Searching
Mink, Douglas J.
For five years, WCSTools has provided image astrometry for astronomers who need accurate positions for objects they wish to observe. Other functions have been added and improved since the package was first released. Support has been added for new catalogs, such as the GSC-ACT, 2MASS Point Source Catalog, and GSC II, as they have been published. A simple command line interface can search any supported catalog, returning information in several standard formats, whether the catalog is on a local disk or searchable over the World Wide Web. The catalog searching routine can be located on either end (or both ends!) of such a web connection, and the output from one catalog search can be used as the input to another search.
Hill, Paul; MacArthur, Stacey; Read, Nick
2014-01-01
Effective Internet search skills are essential with the continually increasing amount of information available on the Web. Extension personnel are required to find information to answer client questions and to conduct research on programs. Unfortunately, many lack the skills necessary to effectively navigate the Internet and locate needed…
de Leeuw, Joshua R; Motz, Benjamin A
2016-03-01
Behavioral researchers are increasingly using Web-based software such as JavaScript to conduct response time experiments. Although there has been some research on the accuracy and reliability of response time measurements collected using JavaScript, it remains unclear how well this method performs relative to standard laboratory software in psychologically relevant experimental manipulations. Here we present results from a visual search experiment in which we measured response time distributions with both Psychophysics Toolbox (PTB) and JavaScript. We developed a methodology that allowed us to simultaneously run the visual search experiment with both systems, interleaving trials between two independent computers, thus minimizing the effects of factors other than the experimental software. The response times measured by JavaScript were approximately 25 ms longer than those measured by PTB. However, we found no reliable difference in the variability of the distributions related to the software, and both software packages were equally sensitive to changes in the response times as a result of the experimental manipulations. We concluded that JavaScript is a suitable tool for measuring response times in behavioral research.
What and how children search on the web
Duarte Torres, Sergio; Weber, Ingmar
2011-01-01
The Internet has become an important part of the daily life of children as a source of information and leisure activities. Nonetheless, given that most of the content available on the web is aimed at the general public, children are constantly exposed to inappropriate content, either because the language goes beyond their reading skills, their attention span differs from grown-ups or simple because the content is not targeted at children as is the case of ads and adult content. In this work w...
Jochmann-Mannak, Hanna; Huibers, Theo W.C.; Lentz, Leo; Sanders, Ted
2010-01-01
Children frequently make use of the Internet to search for information. However, research shows that children experience many problems with searching and browsing the web. The last decade numerous search environments have been developed, especially for children. Do these search interfaces support
An application of TOPSIS for ranking internet web browsers
Directory of Open Access Journals (Sweden)
Shahram Rostampour
2012-07-01
Full Text Available Web browser is one of the most important internet facilities for surfing the internet. A good web browser must incorporate literally tens of features such as integrated search engine, automatic updates, etc. Each year, ten web browsers are formally introduced as top best reviewers by some organizations. In this paper, we propose the implementation of TOPSIS technique to rank ten web browsers. The proposed model of this paper uses five criteria including speed, features, security, technical support and supported configurations. In terms of speed, Safari is the best web reviewer followed by Google Chrome and Internet Explorer while Opera is the best web reviewer when we look into 20 different features. We have also ranked these web browsers using all five categories together and the results indicate that Opera, Internet explorer, Firefox and Google Chrome are the best web browsers to be chosen.
A comparative study of six European databases of medically oriented Web resources.
Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes
2005-10-01
The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.
Directory of Open Access Journals (Sweden)
Rony Baskoro Lukito
2014-12-01
Full Text Available The purpose of this research is how to optimize a web design that can increase the number of visitors. The number of Internet users in the world continues to grow in line with advances in information technology. Products and services marketing media do not just use the printed and electronic media. Moreover, the cost of using the Internet as a medium of marketing is relatively inexpensive when compared to the use of television as a marketing medium. The penetration of the internet as a marketing medium lasted for 24 hours in different parts of the world. But to make an internet site into a site that is visited by many internet users, the site is not only good from the outside view only. Web sites that serve as a medium for marketing must be built with the correct rules, so that the Web site be optimal marketing media. One of the good rules in building the internet site as a marketing medium is how the content of such web sites indexed well in search engines like google. Search engine optimization in the index will be focused on the search engine Google for 83% of internet users across the world using Google as a search engine. Search engine optimization commonly known as SEO (Search Engine Optimization is an important rule that the internet site is easier to find a user with the desired keywords.
A fuzzy method for improving the functionality of search engines based on user's web interactions
Directory of Open Access Journals (Sweden)
Farzaneh Kabirbeyk
2015-04-01
Full Text Available Web mining has been widely used to discover knowledge from various sources in the web. One of the important tools in web mining is mining of web user’s behavior that is considered as a way to discover the potential knowledge of web user’s interaction. Nowadays, Website personalization is regarded as a popular phenomenon among web users and it plays an important role in facilitating user access and provides information of users’ requirements based on their own interests. Extracting important features about web user behavior plays a significant role in web usage mining. Such features are page visit frequency in each session, visit duration, and dates of visiting a certain pages. This paper presents a method to predict user’s interest and to propose a list of pages based on their interests by identifying user’s behavior based on fuzzy techniques called fuzzy clustering method. Due to the user’s different interests and use of one or more interest at a time, user’s interest may belong to several clusters and fuzzy clustering provide a possible overlap. Using the resulted cluster helps extract fuzzy rules. This helps detecting user’s movement pattern and using neural network a list of suggested pages to the users is provided.
Grooker, KartOO, Addict-o-Matic and More: Really Different Search Engines
Descy, Don E.
2009-01-01
There are hundreds of unique search engines in the United States and thousands of unique search engines around the world. If people get into search engines designed just to search particular web sites, the number is in the hundreds of thousands. This article looks at: (1) clustering search engines, such as KartOO (www.kartoo.com) and Grokker…
An Innovative Approach for online Meta Search Engine Optimization
Manral, Jai; Hossain, Mohammed Alamgir
2015-01-01
This paper presents an approach to identify efficient techniques used in Web Search Engine Optimization (SEO). Understanding SEO factors which can influence page ranking in search engine is significant for webmasters who wish to attract large number of users to their website. Different from previous relevant research, in this study we developed an intelligent Meta search engine which aggregates results from various search engines and ranks them based on several important SEO parameters. The r...
Analysis and visualization of Arabidopsis thaliana GWAS using web 2.0 technologies.
Huang, Yu S; Horton, Matthew; Vilhjálmsson, Bjarni J; Seren, Umit; Meng, Dazhe; Meyer, Christopher; Ali Amer, Muhammad; Borevitz, Justin O; Bergelson, Joy; Nordborg, Magnus
2011-01-01
With large-scale genomic data becoming the norm in biological studies, the storing, integrating, viewing and searching of such data have become a major challenge. In this article, we describe the development of an Arabidopsis thaliana database that hosts the geographic information and genetic polymorphism data for over 6000 accessions and genome-wide association study (GWAS) results for 107 phenotypes representing the largest collection of Arabidopsis polymorphism data and GWAS results to date. Taking advantage of a series of the latest web 2.0 technologies, such as Ajax (Asynchronous JavaScript and XML), GWT (Google-Web-Toolkit), MVC (Model-View-Controller) web framework and Object Relationship Mapper, we have created a web-based application (web app) for the database, that offers an integrated and dynamic view of geographic information, genetic polymorphism and GWAS results. Essential search functionalities are incorporated into the web app to aid reverse genetics research. The database and its web app have proven to be a valuable resource to the Arabidopsis community. The whole framework serves as an example of how biological data, especially GWAS, can be presented and accessed through the web. In the end, we illustrate the potential to gain new insights through the web app by two examples, showcasing how it can be used to facilitate forward and reverse genetics research. Database URL: http://arabidopsis.usc.edu/
Searching Online Chemical Data Repositories via the ChemAgora Portal.
Zanzi, Antonella; Wittwehr, Clemens
2017-12-26
ChemAgora, a web application designed and developed in the context of the "Data Infrastructure for Chemical Safety Assessment" (diXa) project, provides search capabilities to chemical data from resources available online, enabling users to cross-reference their search results with both regulatory chemical information and public chemical databases. ChemAgora, through an on-the-fly search, informs whether a chemical is known or not in each of the external data sources and provides clikable links leading to the third-party web site pages containing the information. The original purpose of the ChemAgora application was to correlate studies stored in the diXa data warehouse with available chemical data. Since the end of the diXa project, ChemAgora has evolved into an independent portal, currently accessible directly through the ChemAgora home page, with improved search capabilities of online data sources.
The Role of Aesthetics in Web Design
DEFF Research Database (Denmark)
Thorlacius, Lisbeth
2007-01-01
Web sites are rapidly becoming the preferred media choice for information search, company presentation, shopping, entertainment, education, and social contacts. At the same time we live in a period where visual symbols play an increasingly important role in our daily lives. The aim of this article...... is to present and discuss the four main areas in which aesthetics play an important role in the design of successful Web sites: aesthetics play an important role in supporting the content and the functionality, in appealing to the taste of the target audience, in creating the desired image for the sender......, and in addressing the requirements of the Web site genre....
Multi-objective Search-based Mobile Testing
Mao, K.
2017-01-01
Despite the tremendous popularity of mobile applications, mobile testing still relies heavily on manual testing. This thesis presents mobile test automation approaches based on multi-objective search. We introduce three approaches: Sapienz (for native Android app testing), Octopuz (for hybrid/web JavaScript app testing) and Polariz (for using crowdsourcing to support search-based mobile testing). These three approaches represent the primary scientific and technical contributions of the thesis...
A Statistical Ontology-Based Approach to Ranking for Multiword Search
Kim, Jinwoo
2013-01-01
Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships…
Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.
2000-01-01
These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)
Reflections on New Search Engine 新型搜索引擎畅想
Huang, Jiannian
2007-01-01
English abstract]Quick increment of need on internet information resources leads to a rush of search engines. This article introduces some new type of search engines which is appearing and will appear. These search engines includes as follows: grey document search engine, invisible web search engine, knowledge discovery search engine, clustering meta search engine, academic clustering search engine, conception comparison and conception analogy search engine, consultation search engine, teachi...
PIE the search: searching PubMed literature for protein interaction information.
Kim, Sun; Kwon, Dongseop; Shin, Soo-Yong; Wilbur, W John
2012-02-15
Finding protein-protein interaction (PPI) information from literature is challenging but an important issue. However, keyword search in PubMed(®) is often time consuming because it requires a series of actions that refine keywords and browse search results until it reaches a goal. Due to the rapid growth of biomedical literature, it has become more difficult for biologists and curators to locate PPI information quickly. Therefore, a tool for prioritizing PPI informative articles can be a useful assistant for finding this PPI-relevant information. PIE (Protein Interaction information Extraction) the search is a web service implementing a competition-winning approach utilizing word and syntactic analyses by machine learning techniques. For easy user access, PIE the search provides a PubMed-like search environment, but the output is the list of articles prioritized by PPI confidence scores. By obtaining PPI-related articles at high rank, researchers can more easily find the up-to-date PPI information, which cannot be found in manually curated PPI databases. http://www.ncbi.nlm.nih.gov/IRET/PIE/.
Search Engines: Gateway to a New ``Panopticon''?
Kosta, Eleni; Kalloniatis, Christos; Mitrou, Lilian; Kavakli, Evangelia
Nowadays, Internet users are depending on various search engines in order to be able to find requested information on the Web. Although most users feel that they are and remain anonymous when they place their search queries, reality proves otherwise. The increasing importance of search engines for the location of the desired information on the Internet usually leads to considerable inroads into the privacy of users. The scope of this paper is to study the main privacy issues with regard to search engines, such as the anonymisation of search logs and their retention period, and to examine the applicability of the European data protection legislation to non-EU search engine providers. Ixquick, a privacy-friendly meta search engine will be presented as an alternative to privacy intrusive existing practices of search engines.
A systematic framework to discover pattern for web spam classification
Jelodar, Hamed; Wang, Yongli; Yuan, Chi; Jiang, Xiaohui
2017-01-01
Web spam is a big problem for search engine users in World Wide Web. They use deceptive techniques to achieve high rankings. Although many researchers have presented the different approach for classification and web spam detection still it is an open issue in computer science. Analyzing and evaluating these websites can be an effective step for discovering and categorizing the features of these websites. There are several methods and algorithms for detecting those websites, such as decision t...
A framework for automatic annotation of web pages using the Google Rich Snippets vocabulary
Meer, van der J.; Boon, F.; Hogenboom, F.P.; Frasincar, F.; Kaymak, U.
2011-01-01
One of the latest developments for the Semantic Web is Google Rich Snippets, a service that uses Web page annotations for displaying search results in a visually appealing manner. In this paper we propose the Automatic Review Recognition and annOtation of Web pages (ARROW) framework, which is able
GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts.
Naito, Yuki; Bono, Hidemasa
2012-07-01
GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.
Survey of formal and informal citation in Google search engine
Directory of Open Access Journals (Sweden)
Afsaneh Teymourikhani
2016-03-01
Full Text Available Aim: Informal citations is bibliographic information (title or Internet address, citing sources of information resources for informal scholarly communication and always neglected in traditional citation databases. This study is done, in order to answer the question of whether informal citations in the web environment are traceable. The present research aims to determine what proportion of web citations of Google search engine is related to formal and informal citation. Research method: Webometrics is the method used. The study is done on 1344 research articles of 98 open access journal, and the method that is used to extract the web citation from Google search engine is “Web / URL citation extraction". Findings: The findings showed that ten percent of the web citations of Google search engine are formal and informal citations. The highest formal citation in the Google search engine with 19/27% is in the field of library and information science and the lowest official citation by 1/54% is devoted to the field of civil engineering. The highest percentage of informal citations with 3/57% is devoted to sociology and the lowest percentage of informal citations by 0/39% is devoted to the field of civil engineering. Journal Citation is highest with 94/12% in the surgical field and lowest with 5/26 percent in the philosophy filed. Result: Due to formal and informal citations in the Google search engine which is about 10 percent and the reduction of this amount compared to previous research, it seems that track citations by this engine should be treated with more caution. We see that the amount of formal citation is variable in different disciplines. Cited journals in the field of surgery, is highest and in the filed of philosophy is lowest, this indicates that in the filed of philosophy, that is a subset of the social sciences, journals in scientific communication do not play a significant role. On the other hand, book has a key role in this filed
Using anchor text, spam filtering and Wikipedia for web search and entity ranking
Kamps, J.; Kaptein, R.; Koolen, M.; Voorhees, E.M.; Buckland, L.P.
2010-01-01
In this paper, we document our efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track we wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as
Sentiment Analysis of Web Sites Related to Vaginal Mesh Use in Pelvic Reconstructive Surgery.
Hobson, Deslyn T G; Meriwether, Kate V; Francis, Sean L; Kinman, Casey L; Stewart, J Ryan
2018-05-02
The purpose of this study was to utilize sentiment analysis to describe online opinions toward vaginal mesh. We hypothesized that sentiment in legal Web sites would be more negative than that in medical and reference Web sites. We generated a list of relevant key words related to vaginal mesh and searched Web sites using the Google search engine. Each unique uniform resource locator (URL) was sorted into 1 of 6 categories: "medical", "legal", "news/media", "patient generated", "reference", or "unrelated". Sentiment of relevant Web sites, the primary outcome, was scored on a scale of -1 to +1, and mean sentiment was compared across all categories using 1-way analysis of variance. Tukey test evaluated differences between category pairs. Google searches of 464 unique key words resulted in 11,405 URLs. Sentiment analysis was performed on 8029 relevant URLs (3472 legal, 1625 "medical", 1774 "reference", 666 "news media", 492 "patient generated"). The mean sentiment for all relevant Web sites was +0.01 ± 0.16; analysis of variance revealed significant differences between categories (P Web sites categorized as "legal" and "news/media" had a slightly negative mean sentiment, whereas those categorized as "medical," "reference," and "patient generated" had slightly positive mean sentiments. Tukey test showed differences between all category pairs except the "medical" versus "reference" in comparison with the largest mean difference (-0.13) seen in the "legal" versus "reference" comparison. Web sites related to vaginal mesh have an overall mean neutral sentiment, and Web sites categorized as "medical," "reference," and "patient generated" have significantly higher sentiment scores than related Web sites in "legal" and "news/media" categories.
Technologies for information skills in web
Directory of Open Access Journals (Sweden)
Isa Maria Freire
2012-12-01
Full Text Available It presents and discusses the results of Information Project Skills – Tutorials for Intellectual Technology for dissemination of information in Web developed in Intellectuals Laboratory Technology at Department of Information Science, University Federal of Paraíba. Discusses proposed extension action, in partnership with university education areas in Library and Archival, to develop skills to search, organization, production and dissemination of information in the Web. Reports the development of tutorials to transfer intellectual technology in Web technology for community interested as well experience with face workshops held during the I- International Book Exhibition of Paraíba, in 2010. Discusses results and activities for information skills, from reflection on the experience in this first year the Project.
Semantic web in the e-learning
Directory of Open Access Journals (Sweden)
Andrenizia Aquino Eluan
2008-01-01
Full Text Available With the evolution of the technology of information and communication, the Web is adding diversity of resources that can facilitate the development of some areas of the knowledge, because promotes the access and the use of information globalised, accessible and without borders. Discusses the semantic Web as a means of sharing information to adopt standards for interoperability to the communication in network. Among the concerns that surround the education area, are the strategies of search and information retrieval in a relevant and effective for the knowledge of construction and learning. In this context, is the Distance Education, which area can enjoy the resources of the Semantic Web and the advantages of using ontology, which will be presented in this article
2016-07-21
important to you, to listening to your favorite music on- line. Because encryption is essential in protecting the authenticity of personal information...technological trends . Darknet tactics vary making it hard to identify darknet users or data hosts. For example, Dr. Gareth Owen, University of...When you do a simple Web search on a topic, the results that pop up aren’t the whole story. The Internet contains a vast trove of information
Ullman, Richard; Bane, Bob; Yang, Jingli
2008-01-01
A shell script has been written as a means of automatically making HDF-EOS-formatted data sets available via the World Wide Web. ("HDF-EOS" and variants thereof are defined in the first of the two immediately preceding articles.) The shell script chains together some software tools developed by the Data Usability Group at Goddard Space Flight Center to perform the following actions: Extract metadata in Object Definition Language (ODL) from an HDF-EOS file, Convert the metadata from ODL to Extensible Markup Language (XML), Reformat the XML metadata into human-readable Hypertext Markup Language (HTML), Publish the HTML metadata and the original HDF-EOS file to a Web server and an Open-source Project for a Network Data Access Protocol (OPeN-DAP) server computer, and Reformat the XML metadata and submit the resulting file to the EOS Clearinghouse, which is a Web-based metadata clearinghouse that facilitates searching for, and exchange of, Earth-Science data.
Modelo de web semántica para universidades
Directory of Open Access Journals (Sweden)
Karla Abad
2015-12-01
Full Text Available A raíz del estudio de estado actual de micrositios y repositorios en la Universidad Estatal Península de Santa Elena se encontró que su información carecía de semántica óptima y adecuada. Bajo estas circunstancias, se plantea entonces la necesidad de crear un modelo de estructura de web semántica para Universidades, el cual posteriormente fue aplicado a micrositios y repositorio digital de la UPSE, como caso de prueba. Parte de este proyecto incluye la instalación de módulos de software con sus respectivas configuraciones y la utilización de estándares de metadatos como DUBLIN CORE, para la mejora del SEO (optimización en motores de búsqueda; con ello se ha logrado la generación de metadatos estandarizados y la creación de políticas para la subida de información. El uso de metadatos transforma datos simples en estructuras bien organizadas que aportan información y conocimiento para generar resultados en buscadores web. Al culminar la implementación del modelo de web semántica es posible decir que la universidad ha mejorado su presencia y visibilidad en la web a través del indexamiento de información en diferentes motores de búsqueda y posicionamiento en la categorización de universidades y de repositorios de Webometrics (ranking que proporciona clasificación de universidades de todo el mundo. Abstract After examining the current microsites and repositories situation in University, Peninsula of Santa Elena´s, it was found that information lacked optimal and appropriate semantic. Under these circumstances, there is a need to create a semantic web structure model for Universities, which was subsequently applied to UPSE´s microsites and digital repositories, as a test study case. Part of this project includes the installation of software modules with their respective configurations and the use of metadata standards such as DUBLIN CORE, to improve the SEO (Search Engine Optimization; with these applications, it was
A Hybrid Model Ranking Search Result for Research Paper Searching on Social Bookmarking
Directory of Open Access Journals (Sweden)
pijitra jomsri
2015-11-01
Full Text Available Social bookmarking and publication sharing systems are essential tools for web resource discovery. The performance and capabilities of search results from research paper bookmarking system are vital. Many researchers use social bookmarking for searching papers related to their topics of interest. This paper proposes a combination of similarity based indexing “tag title and abstract” and static ranking to improve search results. In this particular study, the year of the published paper and type of research paper publication are combined with similarity ranking called (HybridRank. Different weighting scores are employed. The retrieval performance of these weighted combination rankings are evaluated using mean values of NDCG. The results suggest that HybridRank and similarity rank with weight 75:25 has the highest NDCG scores. From the preliminary result of experiment, the combination ranking technique provide more relevant research paper search results. Furthermore the chosen heuristic ranking can improve the efficiency of research paper searching on social bookmarking websites.
Development of intelligent semantic search system for rubber research data in Thailand
Kaewboonma, Nattapong; Panawong, Jirapong; Pianhanuruk, Ekkawit; Buranarach, Marut
2017-10-01
The rubber production of Thailand increased not only by strong demand from the world market, but was also stimulated strongly through the replanting program of the Thai Government from 1961 onwards. With the continuous growth of rubber research data volume on the Web, the search for information has become a challenging task. Ontologies are used to improve the accuracy of information retrieval from the web by incorporating a degree of semantic analysis during the search. In this context, we propose an intelligent semantic search system for rubber research data in Thailand. The research methods included 1) analyzing domain knowledge, 2) ontologies development, and 3) intelligent semantic search system development to curate research data in trusted digital repositories may be shared among the wider Thailand rubber research community.
Web information retrieval for health professionals.
Ting, S L; See-To, Eric W K; Tse, Y K
2013-06-01
This paper presents a Web Information Retrieval System (WebIRS), which is designed to assist the healthcare professionals to obtain up-to-date medical knowledge and information via the World Wide Web (WWW). The system leverages the document classification and text summarization techniques to deliver the highly correlated medical information to the physicians. The system architecture of the proposed WebIRS is first discussed, and then a case study on an application of the proposed system in a Hong Kong medical organization is presented to illustrate the adoption process and a questionnaire is administrated to collect feedback on the operation and performance of WebIRS in comparison with conventional information retrieval in the WWW. A prototype system has been constructed and implemented on a trial basis in a medical organization. It has proven to be of benefit to healthcare professionals through its automatic functions in classification and summarizing the medical information that the physicians needed and interested. The results of the case study show that with the use of the proposed WebIRS, significant reduction of searching time and effort, with retrieval of highly relevant materials can be attained.
Effects of Diacritics on Web Search Engines’ Performance for Retrieval of Yoruba Documents
Directory of Open Access Journals (Sweden)
Toluwase Victor Asubiaro
2014-06-01
Full Text Available This paper aims to find out the possible effect of the use or nonuse of diacritics in Yoruba search queries on the performance of major search engines, AOL, Bing, Google and Yahoo!, in retrieving documents. 30 Yoruba queries created from the most searched keywords from Nigeria on Google search logs were submitted to the search engines. The search queries were posed to the search engines without diacritics and then with diacritics. All of the search engines retrieved more sites in response to the queries without diacritics. Also, they all retrieved more precise results for queries without diacritics. The search engines also answered more queries without diacritics. There was no significant difference in the precision values of any two of the four search engines for diacritized and undiacritized queries. There was a significant difference in the effectiveness of AOL and Yahoo when diacritics were applied and when they were not applied. The findings of the study indicate that the search engines do not find a relationship between the diacritized Yoruba words and the undiacritized versions. Therefore, there is a need for search engines to add normalization steps to pre-process Yoruba queries and indexes. This study concentrates on a problem with search engines that has not been previously investigated.
Eysenbach, Gunther; Powell, John; Kuss, Oliver; Sa, Eun-Ryoung
The quality of consumer health information on the World Wide Web is an important issue for medicine, but to date no systematic and comprehensive synthesis of the methods and evidence has been performed. To establish a methodological framework on how quality on the Web is evaluated in practice, to determine the heterogeneity of the results and conclusions, and to compare the methodological rigor of these studies, to determine to what extent the conclusions depend on the methodology used, and to suggest future directions for research. We searched MEDLINE and PREMEDLINE (1966 through September 2001), Science Citation Index (1997 through September 2001), Social Sciences Citation Index (1997 through September 2001), Arts and Humanities Citation Index (1997 through September 2001), LISA (1969 through July 2001), CINAHL (1982 through July 2001), PsychINFO (1988 through September 2001), EMBASE (1988 through June 2001), and SIGLE (1980 through June 2001). We also conducted hand searches, general Internet searches, and a personal bibliographic database search. We included published and unpublished empirical studies in any language in which investigators searched the Web systematically for specific health information, evaluated the quality of Web sites or pages, and reported quantitative results. We screened 7830 citations and retrieved 170 potentially eligible full articles. A total of 79 distinct studies met the inclusion criteria, evaluating 5941 health Web sites and 1329 Web pages, and reporting 408 evaluation results for 86 different quality criteria. Two reviewers independently extracted study characteristics, medical domains, search strategies used, methods and criteria of quality assessment, results (percentage of sites or pages rated as inadequate pertaining to a quality criterion), and quality and rigor of study methods and reporting. Most frequently used quality criteria used include accuracy, completeness, readability, design, disclosures, and references provided
Analyzing Web Behavior in Indoor Retail Spaces
Ren, Yongli; Tomko, Martin; Salim, Flora; Ong, Kevin; Sanderson, Mark
2015-01-01
We analyze 18 million rows of Wi-Fi access logs collected over a one year period from over 120,000 anonymized users at an inner-city shopping mall. The anonymized dataset gathered from an opt-in system provides users' approximate physical location, as well as Web browsing and some search history. Such data provides a unique opportunity to analyze the interaction between people's behavior in physical retail spaces and their Web behavior, serving as a proxy to their information needs. We find: ...
Searching with Experience - A Search Engine for Product Information that Learns from its Users
Leeuwen, van J.P.; Jessurun, A.J.; Jansen, G.; Martens, B.; Brown, A.
2005-01-01
This paper describes the motivation and development of a new algorithm for ranking web pages. This development aims to enable the implementation of a search engine that can provide highly personalised results to queries. It was initiated by a request from the Dutch CAD industry, but has generic
Enhancing UCSF Chimera through web services.
Huang, Conrad C; Meng, Elaine C; Morris, John H; Pettersen, Eric F; Ferrin, Thomas E
2014-07-01
Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Karagiannis, P.; Markelis, I.; Paparrizos, K.; Samaras, N.; Sifaleras, A.
2006-01-01
This paper presents new web-based educational software (webNetPro) for "Linear Network Programming." It includes many algorithms for "Network Optimization" problems, such as shortest path problems, minimum spanning tree problems, maximum flow problems and other search algorithms. Therefore, webNetPro can assist the teaching process of courses such…
Web Search Services in 1998: Trends and Challenges.
Feldman, Susan
1998-01-01
Charts the trends and challenges that 1998 has brought to popular search engines such as AltaVista, Excite, HotBot, Infoseek, Lycos, and Northern Light. Highlights testing strategies used, use of real (not artificial) intelligence, innovations, online market pressures, barriers to use, and tips and recommendations. (AEF)
Kuiper, E.; Volman, M.L.L.; Terwel, J.
2008-01-01
Although the Web is almost omnipresent in many children's lives, most children lack adequate Web searching skills as well as skills to process and critically evaluate Web information. In this article, we describe and evaluate an educational program that aimed at acquiring Web skills in the context
Directory of Open Access Journals (Sweden)
Francisco Javier Martínez Méndez
2003-01-01
Full Text Available A considerable number of proposals for measuring the effectiveness of information retrieval systems have been made since the early days of such systems. The consolidation of the World Wide Web as the paradigmatic method for developing the Information Society, and the continuous multiplication of the number of documents published in this environment, has led to the implementation of the most advanced, and extensive information retrieval systems, in the shape of web search engines. Nevertheless, there is an underlying concern about the effectiveness of these systems, especially when they usually present, in response to a question, many documents with little relevance to the users' information needs. The evaluation of these systems has been, up to now, dispersed and various. The scattering is due to the lack of uniformity in the criteria used in evaluation, and this disparity derives from their a periodicity and variable coverage. In this review, we identify three groups of studies: explicit evaluations, experimental evaluations and, more recently, several proposals for the establishment of a global framework to evaluate these systems.
A Study of HTML Title Tag Creation Behavior of Academic Web Sites
Noruzi, Alireza
2007-01-01
The HTML title tag information should identify and describe exactly what a Web page contains. This paper analyzes the "Title element" and raises a significant question: "Why is the title tag important?" Search engines base search results and page rankings on certain criteria. Among the most important criteria is the presence of the search keywords…
Exploring the Relevance of Search Engines: An Overview of Google as a Case Study
Directory of Open Access Journals (Sweden)
Ricardo Beltrán-Alfonso
2017-08-01
Full Text Available The huge amount of data on the Internet and the diverse list of strategies used to try to link this information with relevant searches through Linked Data have generated a revolution in data treatment and its representation. Nevertheless, the conventional search engines like Google are kept as strategies with good reception to do search processes. The following article presents a study of the development and evolution of search engines, more specifically, to analyze the relevance of findings based on the number of results displayed in paging systems with Google as a case study. Finally, it is intended to contribute to indexing criteria in search results, based on an approach to Semantic Web as a stage in the evolution of the Web.
Analyzing Web Server Logs to Improve a Site's Usage. The Systems Librarian
Breeding, Marshall
2005-01-01
This column describes ways to streamline and optimize how a Web site works in order to improve both its usability and its visibility. The author explains how to analyze logs and other system data to measure the effectiveness of the Web site design and search engine.
Social Media Marketing: Challenges and Opportunities in the Web 2.0 Marketplace
Constantinides, Efthymios; Lin, A.; Foster, J.; Scifleet, P.
2013-01-01
The present stage in the evolution of the Internet, commonly called Web 2.0, has revolutionized the way people communicate, interact, and share information and has radically changed the way customers search for and buy products. The increasing adoption of Web 2.0 applications and technologies has
Intelligent Search on XML Data
Blanken, Henk; Grabs, T.; Schek, H-J.; Schenkel, R.; Weikum, G.; Unknown, [Unknown
2003-01-01
Recently, we have seen a steep increase in the popularity and adoption of XML, in areas such as traditional databases, e-business, the scientific environment, and on the web. Querying XML documents and data efficiently is a challenging issue; this book approaches search on XML data by combining
Darrah, Brenda
Researchers for small businesses, which may have no access to expensive databases or market research reports, must often rely on information found on the Internet, which can be difficult to find. Although current conventional Internet search engines are now able to index over on billion documents, there are many more documents existing in…
Applying Web Analytics to Online Finding Aids: Page Views, Pathways, and Learning about Users
Directory of Open Access Journals (Sweden)
Mark R. O'English
2011-05-01
Full Text Available Online finding aids, Internet search tools, and increased access to the World Wide Web have greatly changed how patrons find archival collections. Through analyzing eighteen months of access data collected via Web analytics tools, this article examines how patrons discover archival materials. Contrasts are drawn between access from library catalogs and from online search engines, with the latter outweighing the former by an overwhelming margin, and argues whether archival description practices should change accordingly.
An Exploratory Study on the Re-finding Behavior on the Web
Directory of Open Access Journals (Sweden)
Hsiao-Tieh Pu
2010-06-01
Full Text Available It is common for users to relocate information previously found on the web. However, their search behaviors in initial finding and the subsequent re-finding may differ due to the dynamic nature and contextual diversity of the web. This study used experiment, observation, interview, and questionnaires to investigate the characteristics of re-finding behavior and compare users’ performance in finding and re-finding. Though not significantly different, the study participants used more search tools, combined various strategies to obtain contextual clues of finding process, utilized more complex search tactics, and had more interactions with search engines used. Findings also show that participants spent less time in re-finding than in finding, yet the cognitive loading and difficulties increased in re-finding. Participants were satisfied with the results obtained in re-finding, but they also claimed that the search performance would be better if the system offered more functions to support recall of previous search results. Participants’ satisfaction with search performance also varied by task type. Based on the findings, this study recommends that re-finding efficiency may be improved by enhancing recall functionalities in browsers and by using personal information management tools. [Article content in Chinese; Extended abstract in English
Literaure search for intermittent rivers research using ISI Web of Science
U.S. Environmental Protection Agency — The dataset is the bibliometric information included in the ISI Web of Science database of scientific literature. Table S2 accessible from the dataset link provides...
Rucio WebUI - The Web Interface for the ATLAS Distributed Data Management
Beermann, Thomas; The ATLAS collaboration; Barisits, Martin-Stefan; Serfon, Cedric; Garonne, Vincent
2016-01-01
With the current distributed data management system for ATLAS, called Rucio, all user interactions, e.g. the Rucio command line tools or the ATLAS workload management system, communicate with Rucio through the same REST-API. This common interface makes it possible to interact with Rucio using a lot of different programming languages, including Javascript. Using common web application frameworks like JQuery and web.py, a web application for Rucio was built. The main component is R2D2 - the Rucio Rule Definition Droid - which gives the users a simple way to manage their data on the grid. They can search for particular datasets and get details about its metadata and available replicas and easily create rules to create new replicas and delete them if not needed anymore. On the other hand it is possible for site admins to restrict transfers to their site by setting quotas and manually approve transfers. Besides R2D2 additional features include transfer backlog monitoring for shifters, group space monitoring for gr...
PcapDB: Search Optimized Packet Capture, Version 0.1.0.0
Energy Technology Data Exchange (ETDEWEB)
2016-11-04
PcapDB is a packet capture system designed to optimize the captured data for fast search in the typical (network incident response) use case. The technology involved in this software has been submitted via the IDEAS system and has been filed as a provisional patent. It includes the following primary components: capture: The capture component utilizes existing capture libraries to retrieve packets from network interfaces. Once retrieved the packets are passed to additional threads for sorting into flows and indexing. The sorted flows and indexes are passed to other threads so that they can be written to disk. These components are written in the C programming language. search: The search components provide a means to find relevant flows and the associated packets. A search query is parsed and represented as a search tree. Various search commands, written in C, are then used resolve this tree into a set of search results. The tree generation and search execution management components are written in python. interface: The PcapDB web interface is written in Python on the Django framework. It provides a series of pages, API's, and asynchronous tasks that allow the user to manage the capture system, perform searches, and retrieve results. Web page components are written in HTML,CSS and Javascript.
Developing a Grid-based search and categorization tool
Haya, Glenn; Vigen, Jens
2003-01-01
Grid technology has the potential to improve the accessibility of digital libraries. The participants in Project GRACE (Grid Search And Categorization Engine) are in the process of developing a search engine that will allow users to search through heterogeneous resources stored in geographically distributed digital collections. What differentiates this project from current search tools is that GRACE will be run on the European Data Grid, a large distributed network, and will not have a single centralized index as current web search engines do. In some cases, the distributed approach offers advantages over the centralized approach since it is more scalable, can be used on otherwise inaccessible material, and can provide advanced search options customized for each data source.