The World Wide Web is emerging as an all-in-one information source. Tools for searching Web-based information include search engines, subject directories and meta search tools. We take a look at key features of these tools and suggest practical hints for effective Web searching.
Perhaps the most significant tool of our internet age is the web search engine, providing a powerful interface for accessing the vast amount of information available on the world wide web and beyond. While still in its infancy compared to the knowledge tools that precede it - such as the dictionary or encyclopedia - the impact of web search engines on society and culture has already received considerable attention from a variety of academic disciplines and perspectives. This article aims to organize a meta-discipline of “web search studies,” centered around a nucleus of major research on web search engines from five key perspectives: technical foundations and evaluations; transaction log analyses; user studies; political, ethical, and cultural critiques; and legal and policy analyses.
G.Sudeepthi; Anuradha, G.; M.Surendra Prasad Babu
The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now a days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. The Semantic Web is an extension of the current web in which information is given well-defined meaning. Semantic web technologies are pla...
Ortega-Binderberger, Michael; Mehrotra, Sharad; Chakrabarti, Kaushik; Porkaew, Kriengkrai
The Web provides a large repository of multimedia data, text, images, etc. Most current search engines focus on textural retrieval. In this paper, we focus on using an integrated textural and visual search engine for Web documents. We support query refinement which proves useful and enables cross-media browsing in addition to regular search.
Spink, Amanda; Koricich, Andrew; Jansen, B J; Cole, Charles
Sexual information seeking is an important element within human information behavior. Seeking sexually related information on the Internet takes many forms and channels, including chat rooms discussions, accessing Websites or searching Web search engines for sexual materials. The study of sexual Web queries provides insight into sexually-related information-seeking behavior, of value to Web users and providers alike. We qualitatively analyzed queries from logs of 1,025,910 Alta Vista and AlltheWeb.com Web user queries from 2001. We compared the differences in sexually-related Web searching between Alta Vista and AlltheWeb.com users. Differences were found in session duration, query outcomes, and search term choices. Implications of the findings for sexual information seeking are discussed.
Goodrum, Abby; Spink, Amanda
Examines visual information needs as expressed in users' Web image queries on the Excite search engine. Discusses metadata; content-based image retrieval; user interaction with images; terms per query; term frequency; and implications for the development of models for visual information retrieval and for the design of Web search engines.…
Wong, Pak Chung
It's widely recognized that all Web search engines today are almost identical in presentation layout and behavior. In fact, the same presentation approach has been applied to depicting search engine results pages (SERPs) since the first Web search engine launched in 1993. In this Visualization Viewpoints article, I propose to add a visualization feature to Web search engines and suggest that the new addition can improve search engines' performance and capabilities, which in turn lead to better Web search technology.
Prof. M.Surendra Prasad Babu; G.Sudeepthi
The Search engines plays an important role in the success of the Web, Search engines helps any Internet user to rapidly find relevant information. But the unsolved problems of current search engines have led to the development of the Semantic Web. In the environment of Semantic Web, the search engines are more useful and efficient in searching the relevant web information., and our work shows how the fundamental elements of the meta search engine can be used in retriving the information resou...
Reviews the literature on the use of Web search engines in information science research, including: ways users interact with Web search engines; social aspects of searching; structure and dynamic nature of the Web; link analysis; other bibliometric applications; characterizing information on the Web; search engine evaluation and improvement; and…
Smith, John R.; Chang, Shih-Fu
We describe a visual information system prototype for searching for images and videos on the World-Wide Web. New visual information in the form of images, graphics, animations and videos is being published on the Web at an incredible rate. However, cataloging this visual data is beyond the capabilities of current text-based Web search engines. In this paper, we describe a complete system by which visual information on the Web is (1) collected by automated agents, (2) processed in both text and visual feature domains, (3) catalogued and (4) indexed for fast search and retrieval. We introduce an image and video search engine which utilizes both text-based navigation and content-based technology for searching visually through the catalogued images and videos. Finally, we provide an initial evaluation based upon the cataloging of over one half million images and videos collected from the Web.
Mardis, Marcia A.
Discussion of searching for information on the Web focuses on resources that are not always found by traditional Web searches. Describes sources on the hidden Web, including full-text databases, clearinghouses, digital libraries, and learning objects; explains how search engines operate; and suggests that traditional print sources are still…
Kompatsiaris, Ioannis; Triantafyllou, Evangelia; Strintzis, Michael G.
information. These features along with additional information such as the URL location and the date of index procedure are stored in a database. The user can access and search this indexed content through the Web with an advanced and user friendly interface. The output of the system is a set of links......In this paper the development of an intelligent image content-based search engine for the World Wide Web is presented. This system will offer a new form of media representation and access of content available in WWW. Information Web Crawlers continuously traverse the Internet and collect images...
Full Text Available Introduction: Recently, suicide prevention has been an important public health issue. However, with the growing access to information in cyberspace, the harmful information is easily accessible online. To investigate the accessibility of potentially harmful suicide-related information on the internet, we discuss the following issue about searching suicide information on the internet to draw attention to it. Methods: We use five search engines (Google, Yahoo, Bing, Yam, and Sina and four suicide-related search queries (suicide, how to suicide, suicide methods, and want to die in traditional Chinese in April 2016. We classified the first thirty linkages of the search results on each search engine by a psychiatric doctor into suicide prevention, pro-suicide, neutral, unrelated to suicide, or error websites. Results: Among the total 352 unique websites generated, the suicide prevention websites were the most frequent among the search results (37.8%, followed by websites unrelated to suicide (25.9% and neutral websites (23.0%. However, pro-suicide websites were still easily accessible (9.7%. Besides, compared with the USA and China, the search engine originating in Taiwan had the lowest accessibility to pro-suicide information. The results of ANOVA showed a significant difference between the groups, F = 8.772, P < 0.001. Conclusions: This study results suggest a need for further restrictions and regulations of pro-suicide information on the internet. Providing more supportive information online may be an effective plan for suicidal prevention.
Kompatsiaris, Ioannis; Triantafyllou, Evangelia; Strintzis, Michael G.
In this paper the development of an intelligent imagecontent-based search engine for the World Wide Web is presented.This system will offer a new form of media representationand access of content available in WWW. InformationWeb Crawlers continuously traverse the Internet and collectimages that are subsequently indexed based on integratedfeature vectors. As a basis for the indexing, the K-Meansalgorithm is used, modified so as to take into account thecoherence of the regions. Based on the ext...
Qiu, Guoping; Palmer, R. D.
This paper describes the development of a prototype of a Web Image Search Engine (WISE), which allows users to search for images on the WWW by image examples, in a similar fashion to current search engines that allow users to find related Web pages using text matching on keywords. The system takes an image specified by the user and finds similar images available on the WWW by comparing the image contents using low level image features. The current version of the WISE system consists of a graphical user interface (GUI), an autonomous Web agent, an image comparison program and a query processing program. The users specify the URL of a target image and the URL of the starting Web page from where the program will 'crawl' the Web, finding images along the way and retrieve those satisfying a certain constraints. The program then computes the visual features of the retrieved images and performs content-based comparison with the target image. The results of the comparison are then sorted according to a certain similarity measure, which along with thumbnails and information associated with the images, such as the URLs; image size, etc. are then written to an HTML page. The resultant page is stored on a Web server and is outputted onto the user's Web browser once the search process is complete. A unique feature of the current version of WISE is its image content comparison algorithm. It is based on the comparison of image palettes and it therefore very efficient in retrieving one of the two universally accepted image formats on the Web, 'gif.' In gif images, the color palette is contained in its header and therefore it is only necessary to retrieve the header information rather than the whole images, thus making it very efficient.
Describes the field search capabilities of selected Web search engines (AltaVista, HotBot, Infoseek, Lycos, Yahoo!) and includes a chart outlining what fields (date, title, URL, images, audio, video, links, page depth) are searchable, where to go on the page to search them, the syntax required (if any), and how field search queries are entered.…
Web Feet, 2001
This guide to search engines for the World Wide Web discusses selecting the right search engine; interpreting search results; major search engines; online tutorials and guides; search engines for kids; specialized search tools for various subjects; and other specialized engines and gateways. (LRW)
Sima Shafi’ie Alavijeh
Full Text Available The present investigation was aimed to study the scope of presence of Dublin Core metadata elements and HTML meta tags in web pages. Ninety web pages were chosen by searching general search engines (Google, Yahoo and MSN. The scope of metadata elements (Dublin Core and HTML Meta tags present in these pages as well as existence of a significant correlation between presence of meta elements and type of search engines were investigated. Findings indicated very low presence of both Dublin Core metadata elements and HTML meta tags in the pages retrieved which in turn illustrates the very low usage of meta data elements in web pages. Furthermore, findings indicated that there are no significant correlation between the type of search engine used and presence of metadata elements. From the standpoint of including metadata in retrieval of web sources, search engines do not significantly differ from one another.
Full Text Available Search Engines are used for retrieving the information from the web. Most of the times, the importance is laid on top 10 results sometimes it may shrink as top 5, because of the time constraint and reliability on the search engines. Users believe that top 10 or 5 of total results are more relevant. Here comes the problem of spamdexing. It is a method to deceive the search result quality. Falsified metrics such as inserting enormous amount of keywords or links in website may take that website to the top 10 or 5 positions. This paper proposes a classifier based on the Reptree (Regression tree representative. As an initial step Link-based features such as neighbors, pagerank, truncated pagerank, trustrank and assortativity related attributes are inferred. Based on this features, tree is constructed. The tree uses the feature inference to differentiate spam sites from legitimate sites. WEBSPAM-UK-2007 dataset is taken as a base. It is preprocessed and converted into five datasets FEATA, FEATB, FEATC, FEATD and FEATE. Only link based features are taken for experiments. This paper focus on link spam alone. Finally a representative tree is created which will more precisely classify the web spam entries. Results are given. Regression tree classification seems to perform well as shown through experiments.
Zhu, Jianhan; Song, Dawei; Eisenstadt, Marc; Barladeanu, Cristi; Rüger, Stefan
The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based search. Dyniqx integrates search results from search services of documents, images, and videos for generating a unified list of ranked search results. Dyniqx exploits the availability of metadata in search services such as PubMed, Google Scholar, Google Image Search, and Google Video Search etc for fusing search results from...
Alireza Isfandiyari Moghadam; Zohreh Bahari Mova’fagh
The present investigation concerns evaluation, comparison and analysis of search options existing within web-based meta-search engines. 64 meta-search engines were identified. 19 meta-search engines that were free, accessible and compatible with the objectives of the present study were selected. An author’s constructed check list was used for data collection. Findings indicated that all meta-search engines studied used the AND operator, phrase search, number of results displayed setting, pr...
Full Text Available This paper analyzes the results of transaction logs at California State University, Los Angeles (CSULA and studies the effects of implementing a Web-based OPAC along with interface changes. The authors find that user success in subject searching remains problematic. A major increase in the frequency of searches that would have been more successful in resources other than the library catalog is noted over the time period 2000-2002. The authors attribute this increase to the prevalence of Web search engines and suggest that metasearching, relevance-ranked results, and relevance feedback ( "more like this" are now expected in user searching and should be integrated into online catalogs as search options.
Chen, Chi-Huang; Hsieh, Sheau-Ling; Weng, Yung-Ching; Chang, Wen-Yung; Lai, Feipei
Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
Enormous expanses of the Internet are unreachable with standard web search engines. This book provides the key to finding these hidden resources by identifying how to uncover and use invisible web resources. Mapping the invisible Web, when and how to use it, assessing the validity of the information, and the future of Web searching are topics covered in detail. Only 16 percent of Net-based information can be located using a general search engine. The other 84 percent is what is referred to as the invisible Web-made up of information stored in databases. Unlike pages on the visible Web, informa
The book consists of four sections including (1) the context of web search, (2) how people search the web, (3) subjects of web search and (4) conclusion: trends and future directions. The first section includes three chapters addressing a brief but informative introduction about the main involved elements of web search process and web search research including search engines mechanism, human computer interaction in web searching and research design in web search studies.
Introduction: Investigates how effectively Web search engines index new sites from different countries. The primary interest is whether new sites are indexed equally or whether search engines are biased towards certain countries. If major search engines show biased coverage it can be considered a significant economic and political problem because…
Full Text Available “White hat” search engine optimization refers to the practice of publishing web pages that are useful to humans, while enabling search engines and web applications to better understand the structure and content of your website. This article teaches you to add structured data to your website so that search engines can more easily connect patrons to your library locations, hours, and contact information. A web page for a branch of the Greater Sudbury Public Library retrieved in January 2015 is used as the basis for examples that progressively enhance the page with structured data. Finally, some of the advantages structured data enables beyond search engine optimization are explored
Hennesy, Cody; Bowman, John
Google's first foray onto the web made search simple and results relevant. With its Co-op platform, Google has taken another step toward dramatically increasing the relevancy of search results, further adapting the World Wide Web to local needs. Google Custom Search Engine, a tool on the Co-op platform, puts one in control of his or her own search…
Alireza Isfandiyari Moghadam
Full Text Available The present investigation concerns evaluation, comparison and analysis of search options existing within web-based meta-search engines. 64 meta-search engines were identified. 19 meta-search engines that were free, accessible and compatible with the objectives of the present study were selected. An author’s constructed check list was used for data collection. Findings indicated that all meta-search engines studied used the AND operator, phrase search, number of results displayed setting, previous search query storage and help tutorials. Nevertheless, none of them demonstrated any search options for hypertext searching and displaying the size of the pages searched. 94.7% support features such as truncation, keywords in title and URL search and text summary display. The checklist used in the study could serve as a model for investigating search options in search engines, digital libraries and other internet search tools.
Internet Granthalaya urges world wide advocates and targets at the task of creating a new search engine and dedicated browseer. Internet Granthalaya may be the ultimate search engine exclusively dedicated for every library use to search and organize the world wide web libary resources
Full Text Available Web search engines (WSE are powerful and popular tools in the field of information service management. This study is an attempt to examine the impact and usefulness of web search engines in providing virtual reference services (VRS within academic libraries in Pakistan. The study also attempts to investigate the relevant expertise and skills of library professionals in providing digital reference services (DRS efficiently using web search engines. Methodology used in this study is quantitative in nature. The data was collected from fifty public and private sector universities in Pakistan using a structured questionnaire. Microsoft Excel and SPSS were used for data analysis. The study concludes that web search engines are commonly used by librarians to help users (especially research scholars by providing digital reference services. The study also finds a positive correlation between use of web search engines and quality of digital reference services provided to library users. It is concluded that although search engines have increased the expectations of users and are really big competitors to a library’s reference desk, they are however not an alternative to reference service. Findings reveal that search engines pose numerous challenges for librarians and the study also attempts to bring together possible remedial measures. This study is useful for library professionals to understand the importance of search engines in providing VRS. The study also provides an intellectual comparison among different search engines, their capabilities, limitations, challenges and opportunities to provide VRS effectively in libraries.
The inverted index is the main data structure used by all the major search engines. Search engines build an inverted index on their collection to speed up query processing. As the size of the web grows, the length of the inverted list structures, which can easily grow to hundreds of MBs or even GBs for common terms (roughly linear in the size of…
Spink, Amanda; Yang, Yin; Jansen, Jim; Nykanen, Pirrko; Lorence, Daniel P; Ozmutlu, Seda; Ozmutlu, H Cenk
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.
Zweigenbaum, P; Darmoni, S J; Grabar, N; Douyère, M; Benichou, J
Manually indexed Internet health catalogs such as CliniWeb or CISMeF provide resources for retrieving high-quality health information. Users of these quality-controlled subject gateways are most often referred to them by general search engines such as Google, AltaVista, etc. This raises several questions, among which the following: what is the relative visibility of medical Internet catalogs through search engines? This study addresses this issue by measuring and comparing the visibility of six major, MeSH-indexed health catalogs through four different search engines (AltaVista, Google, Lycos, Northern Light) in two languages (English and French). Over half a million queries were sent to the search engines; for most of these search engines, according to our measures at the time the queries were sent, the most visible catalog for English MeSH terms was CliniWeb and the most visible one for French MeSH terms was CISMeF.
Explores the interrelation between Web publishing and information retrieval technologies and lists new approaches to Web indexing and searching. Highlights include Web directories; search engines; portalisation; Internet service providers; browser providers; meta search engines; popularity based analysis; natural language searching; links-based…
Discusses various types of information that can be retrieved from the Web via search engines. Highlights include Web pages; time frames, including historical coverage and currentness; text pages in formats other than HTML; directory sites; news articles; discussion groups; images; and audio and video. (LRW)
Home; Journals; Resonance – Journal of Science Education; Volume 3; Issue 11. Web Search Engines - How to Get What You Want from the World Wide Web. T B Rajashekar. General Article Volume 3 Issue 11 November 1998 pp 40-53. Fulltext. Click here to view fulltext PDF. Permanent link:
Currently, the World Wide Web contains an estimated 7.4 million sites (OCLC, 2001). Yet even the most experienced searcher, using the most robust search engines, can access only about 16% of these pages (Dahn, 2001). The other 84% of the publicly available information on the Web is referred to as the "hidden,""invisible," or…
W. T. Kritzinger
Full Text Available The growth of the World Wide Web has spawned a wide variety of new information sources, which has also left users with the daunting task of determining which sources are valid. Many users rely on the Web as an information source because of the low cost of information retrieval. It is also claimed that the Web has evolved into a powerful business tool. Examples include highly popular business services such as Amazon.com and Kalahari.net. It is estimated that around 80% of users utilize search engines to locate information on the Internet. This, by implication, places emphasis on the underlying importance of Web pages being listed on search engines indices. Empirical evidence that the placement of key words in certain areas of the body text will have an influence on the Web sites' visibility to search engines could not be found in the literature. The result of two experiments indicated that key words should be concentrated towards the top, and diluted towards the bottom of a Web page to increase visibility. However, care should be taken in terms of key word density, to prevent search engine algorithms from raising the spam alarm.
Roth, V.; Pinsdorf, U.; Peters, J
Current search engines crawl the Web, download content, and digest this content locally. For multimedia content, this involves considerable volumes of data. Furthermore, this process covers only publicly available content because content providers are concerned that they otherwise loose control over the distribution of their intellectual property. We present the prototype of our secure and distributed search engine, which dynamically pushes content based feature extraction to image providers....
Deshpande, Yogesh; Murugesan, San; Ginige, Athula; Hansen, Steve; Schwabe, Daniel; Gaedke, Martin; White, Bebo
Web Engineering is the application of systematic, disciplined and quantifiable approaches to development, operation, and maintenance of Web-based applications. It is both a pro-active approach and a growing collection of theoretical and empirical research in Web application development. This paper gives an overview of Web Engineering by addressing the questions: a) why is it needed? b) what is its domain of operation? c) how does it help and what should it do to improve Web application develo...
Weight (PFW) for a web page and grouped for categorization. Using these experimental results we classified the web pages into four different groups i.e. (1) Simple type (2) Axis shifted (3) Fluctuated and (4) Oscillating types. Implication in development...
Martinez, F. J.; Pastor, J. A.; Rodriguez, J. V.; Lopez, Rosana; Rodriguez, J. V., Jr.
Introduction: The Internet is a dynamic environment which is continuously being updated. Search engines have been, currently are and in all probability will continue to be the most popular systems in this information cosmos. Method: In this work, special attention has been paid to the series of changes made to search engines up to this point,…
Wong, Pak C.
Since the first world wide web (WWW) search engine quietly entered our lives in 1994, the “information need” behind web searching has rapidly grown into a multi-billion dollar business that dominates the internet landscape, drives e-commerce traffic, propels global economy, and affects the lives of the whole human race. Today’s search engines are faster, smarter, and more powerful than those released just a few years ago. With the vast investment pouring into research and development by leading web technology providers and the intense emotion behind corporate slogans such as “win the web” or “take back the web,” I can’t help but ask why are we still using the very same “text-only” interface that was used 13 years ago to browse our search engine results pages (SERPs)? Why has the SERP interface technology lagged so far behind in the web evolution when the corresponding search technology has advanced so rapidly? In this article I explore some current SERP interface issues, suggest a simple but practical visual-based interface design approach, and argue why a visual approach can be a strong candidate for tomorrow’s SERP interface.
Jalali, Vahid; Matash Borujerdi, Mohammad Reza
There is a huge growth in the volume of published biomedical research in recent years. Many medical search engines are designed and developed to address the over growing information needs of biomedical experts and curators. Significant progress has been made in utilizing the knowledge embedded in medical ontologies and controlled vocabularies to assist these engines. However, the lack of common architecture for utilized ontologies and overall retrieval process, hampers evaluating different search engines and interoperability between them under unified conditions. In this paper, a unified architecture for medical search engines is introduced. Proposed model contains standard schemas declared in semantic web languages for ontologies and documents used by search engines. Unified models for annotation and retrieval processes are other parts of introduced architecture. A sample search engine is also designed and implemented based on the proposed architecture in this paper. The search engine is evaluated using two test collections and results are reported in terms of precision vs. recall and mean average precision for different approaches used by this search engine.
Adam, Anna; Mowers, Helen
There are a lot of useful tools on the Web, all those social applications, and the like. Still most people go online for one thing--to perform a basic search. For most fact-finding missions, the Web is there. But--as media specialists well know--the sheer wealth of online information can hamper efforts to focus on a few reliable references.…
The World Wide Web contains billions of documents (and counting); hence, it is likely that some document will contain the answer or content you are searching for. While major search engines like Bing and Google often manage to return relevant results to your query, there are plenty of situations in
Ward's Cluster analyses including the Pseudo T² Statistical analyses were used to determine the mental model clusters for the seventeen salient design features of Web search engines at each time point. The cubic clustering criterion (CCC) and the dendogram were conducted for each sample to help determine the number ...
Discusses how to find digital images on the Web. Considers images and copyright; provides an overview of the search capabilities of six search engines, including AltaVista, Google, AllTheWeb.com, Ditto.com, Picsearch, and Lycos; and describes specialized image search engines. (LRW)
Researchers for small businesses, which may have no access to expensive databases or market research reports, must often rely on information found on the Internet, which can be difficult to find. Although current conventional Internet search engines are now able to index over on billion documents, there are many more documents existing in…
Web Engineering is the application of systematic, disciplined and quantifiable approaches to development, operation, and maintenance of Web-based applications. It is both a pro-active approach and a growing collection of theoretical and empirical research in Web application development. This paper gives an overview of Web Engineering by addressing the questions: (a) why is it needed? (b) what is its domain of operation? (c) how does it help and what should it do to improve Web application development? and (d) how should it be incorporated in education and training? The paper discusses the significant differences that exist between Web applications and conventional software, the taxonomy of Web applications, the progress made so far and the research issues and experience of creating a specialization at the master's level. The paper reaches a conclusion that Web Engineering at this stage is a moving target since Web technologies are constantly evolving, making new types of applications possible, which in turn may require innovations in how they are built, deployed and maintained.
Jody Condit Fagan
Academic web search engines have become central to scholarly research. While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention...
Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng
A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Full Text Available This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The results of this study show that academics have indeed accepted the open Web as a useful information resource and Web search engines as retrieval tools when seeking information for academic and research work. The majority of respondents used the open Web and Web search engines on a daily or weekly basis to source academic and research information. The main obstacles presented by using the open Web and Web search engines included lack of time to search and browse the Web, information overload, poor network speed and the slow downloading speed of webpages.
Full Text Available This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The results of this study show that academics have indeed accepted the open Web as a useful information resource and Web search engines as retrieval tools when seeking information for academic and research work. The majority of respondents used the open Web and Web search engines on a daily or weekly basis to source academic and research information. The main obstacles presented by using the open Web and Web search engines included lack of time to search and browse the Web, information overload, poor network speed and the slow downloading speed of webpages.
Fu, Linda Y; Zook, Kathleen; Spoehr-Labutta, Zachary; Hu, Pamela; Joseph, Jill G
Online information can influence attitudes toward vaccination. The aim of the present study was to provide a systematic evaluation of the search engine ranking, quality, and content of Web pages that are critical versus noncritical of human papillomavirus (HPV) vaccination. We identified HPV vaccine-related Web pages with the Google search engine by entering 20 terms. We then assessed each Web page for critical versus noncritical bias and for the following quality indicators: authorship disclosure, source disclosure, attribution of at least one reference, currency, exclusion of testimonial accounts, and readability level less than ninth grade. We also determined Web page comprehensiveness in terms of mention of 14 HPV vaccine-relevant topics. Twenty searches yielded 116 unique Web pages. HPV vaccine-critical Web pages comprised roughly a third of the top, top 5- and top 10-ranking Web pages. The prevalence of HPV vaccine-critical Web pages was higher for queries that included term modifiers in addition to root terms. Compared with noncritical Web pages, Web pages critical of HPV vaccine overall had a lower quality score than those with a noncritical bias (p engine queries despite being of lower quality and less comprehensive than noncritical Web pages. Copyright © 2016 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Luo, Bo; Wang, Xiaogang; Tang, Xiaoou
Using both text and image content features, a hybrid image retrieval system for Word Wide Web is developed in this paper. We first use a text-based image meta-search engine to retrieve images from the Web based on the text information on the image host pages to provide an initial image set. Because of the high-speed and low cost nature of the text-based approach, we can easily retrieve a broad coverage of images with a high recall rate and a relatively low precision. An image content based ordering is then performed on the initial image set. All the images are clustered into different folders based on the image content features. In addition, the images can be re-ranked by the content features according to the user feedback. Such a design makes it truly practical to use both text and image content for image retrieval over the Internet. Experimental results confirm the efficiency of the system.
Sayama, Hiroki; Akaishi, Jin
Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at the individual level (diversity of topics related to the researcher) and at the social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.
Hsieh, Sheau-Ling; Chang, Wen-Yung; Chen, Chi-Huang; Weng, Yung-Ching
Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians' ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.
Full Text Available Internet search engines are powerful tools to find electronics objects such as addresses of individuals and institutions, documents, statistics of all kinds, dictionaries, catalogs, product information etc. This paper explains how to build and run some very common search engines on Unix platforms, so as to serve documents through the Web.
Ozmutlu, Seda; Ozmutlu, H. C.; Spink, Amanda
Findings from a study of users' multitasking searches on Web search engines include: multitasking searches are a noticeable user behavior; multitasking search sessions are longer than regular search sessions in terms of queries per session and duration; both Excite and AlltheWeb.com users search for about three topics per multitasking session and…
Full Text Available Abstract Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data and biological resource banks (such as mouse models of disease and cell lines. With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/.
U.S. Environmental Protection Agency — The Chemical Search Web Utility is an intuitive web application that allows the public to easily find the chemical that they are interested in using, and which...
Dao, Tien Tuan; Hoang, Tuan Nha; Ta, Xuan Hien; Tho, Marie Christine Ho Ba
Human musculoskeletal system resources of the human body are valuable for the learning and medical purposes. Internet-based information from conventional search engines such as Google or Yahoo cannot response to the need of useful, accurate, reliable and good-quality human musculoskeletal resources related to medical processes, pathological knowledge and practical expertise. In this present work, an advanced knowledge-based personalized search engine was developed. Our search engine was based on a client-server multi-layer multi-agent architecture and the principle of semantic web services to acquire dynamically accurate and reliable HMSR information by a semantic processing and visualization approach. A security-enhanced mechanism was applied to protect the medical information. A multi-agent crawler was implemented to develop a content-based database of HMSR information. A new semantic-based PageRank score with related mathematical formulas were also defined and implemented. As the results, semantic web service descriptions were presented in OWL, WSDL and OWL-S formats. Operational scenarios with related web-based interfaces for personal computers and mobile devices were presented and analyzed. Functional comparison between our knowledge-based search engine, a conventional search engine and a semantic search engine showed the originality and the robustness of our knowledge-based personalized search engine. In fact, our knowledge-based personalized search engine allows different users such as orthopedic patient and experts or healthcare system managers or medical students to access remotely into useful, accurate, reliable and good-quality HMSR information for their learning and medical purposes. Copyright © 2012 Elsevier Inc. All rights reserved.
SEO--short for Search Engine Optimization--is the art, craft, and science of driving web traffic to web sites. Web traffic is food, drink, and oxygen--in short, life itself--to any web-based business. Whether your web site depends on broad, general traffic, or high-quality, targeted traffic, this PDF has the tools and information you need to draw more traffic to your site. You'll learn how to effectively use PageRank (and Google itself); how to get listed, get links, and get syndicated; and much more. The field of SEO is expanding into all the possible ways of promoting web traffic. This
Banning, James H.; Sexton, Julie; Most, David E.; Maier, Shelby
Photographs found in the search for and exploration of 13 university mining engineering department Web sites were studied for their asymmetries of power by analyzing the role (student, instructor, secretarial staff, miner, and honoree) and posture (sitting, standing) of men and women in the photographs. The Web site photographs showed a higher rate of women occupying student roles than men did. Women had a lower rate of occupying instructor and miner roles. No women were portrayed as being honored. Men exhibited a higher rate of occupying the standing posture than did women. Women were more often shown sitting than men were. Implications of portraying a nonequitable power structure between men and women in the search for and exploration of mining engineering Web sites are discussed, including a recommendation that all academic departments should examine the portrayal of gender on their Web sites.
Filistea Naude; Chris Rensleigh; Adeline S.A. du Toit
This study provided insight into the significance of the open Web as an information resource and Web search engines as research tools amongst academics. The academic staff establishment of the University of South Africa (Unisa) was invited to participate in a questionnaire survey and included 1188 staff members from five colleges. This study culminated in a PhD dissertation in 2008. One hundred and eighty seven respondents participated in the survey which gave a response rate of 15.7%. The re...
Although there are many news search engines on the Web, finding the news items one wants can be challenging. Choosing appropriate search terms is one of the biggest challenges. Unless one has seen the article that one is seeking, it is often difficult to select words that were used in the headline or text of the article. The limited archives of…
Gregg, William; Jirjis, Jim; Lorenzi, Nancy M.; Giuse, Dario
This poster details the design and use of the StarTracker clinical search engine. This program is fully integrated within our electronic medical record system and allows users to enter simple rules that direct formatted searches of multiple legacy databases.
Christopher Bone; Alan Ager; Ken Bunzel; Lauren Tierney
The volume of publically available geospatial data on the web is rapidly increasing due to advances in server-based technologies and the ease at which data can now be created. However, challenges remain with connecting individuals searching for geospatial data with servers and websites where such data exist. The objective of this paper is to present a publically...
Filipe Silva; Gabriel David
Currently, information systems are usually supported by databases (DB) and accessed through a Web interface. Pages in such Web sites are not drawn from HTML files but are generated on the fly upon request. Indexing and searching such dynamic pages raises several extra difficulties not solved by most search engines, which were designed for static contents. In this paper we describe the development of a search engine that overcomes most of the problems for a specific Web site, how the limitatio...
Fatmaa El Zahraa Mohamed Abdou
A general study about the internet search engines, the study deals main 7 points; the differance between search engines and search directories, components of search engines, the percentage of sites covered by search engines, cataloging of sites, the needed time for sites appearance in search engines, search capabilities, and types of search engines.
Borisov, A.; Markov, I.; de Rijke, M.; Serdyukov, P.
Understanding user browsing behavior in web search is key to improving web search effectiveness. Many click models have been proposed to explain or predict user clicks on search engine results. They are based on the probabilistic graphical model (PGM) framework, in which user behavior is represented
Allen, J W; Finch, R J; Coleman, M G; Nathanson, L K; O'Rourke, N A; Fielding, G A
This study was undertaken to determine the quality of information on the Internet regarding laparoscopy. Four popular World Wide Web search engines were used with the key word "laparoscopy." Advertisements, patient- or physician-directed information, and controversial material were noted. A total of 14,030 Web pages were found, but only 104 were unique Web sites. The majority of the sites were duplicate pages, subpages within a main Web page, or dead links. Twenty-eight of the 104 pages had a medical product for sale, 26 were patient-directed, 23 were written by a physician or group of physicians, and six represented corporations. The remaining 21 were "miscellaneous." The 46 pages containing educational material were critically reviewed. At least one of the senior authors found that 32 of the pages contained controversial or misleading statements. All of the three senior authors (LKN, NAO, GAF) independently agreed that 17 of the 46 pages contained controversial information. The World Wide Web is not a reliable source for patient or physician information about laparoscopy. Authenticating medical information on the World Wide Web is a difficult task, and no government or surgical society has taken the lead in regulating what is presented as fact on the World Wide Web.
More than half the world’s population uses web search engines, resulting in over half a billion search queries every single day. For many people web search engines are among the first resources they go to when a question arises. Moreover, search engines have for many become the most trusted route to
Discusses the evaluation of search engines and uses neural networks in stochastic simulation of the number of rejected Web pages per search query. Topics include the iterative radial basis functions (RBF) neural network; precision; response time; coverage; Boolean logic; regression models; crawling algorithms; and implications for search engine…
Current search engines--even the constantly surprising Google--seem unable to leap the next big barrier in search: the trillions of bytes of dynamically generated data created by individual web sites around the world, or what some researchers call the "deep web." The challenge now is not information overload, but information overlook.…
Describes common options and features to consider in evaluating which meta search engine will best meet a searcher's needs. Discusses number and names of engines searched; other sources and specialty engines; search queries; other search options; and results options. (AEF)
Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S
The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Meta-search engine mengorganisasikan penyatuan hasil dari berbagai search engine dengan tujuan untuk meningkatkan presisi hasil pencarian dokumen web. Pada survei teknik perangkingan meta-search engine ini akan didiskusikan isu-isu pra-pemrosesan, rangking, dan berbagai teknik penggabungan hasil pencarian dari search engine yang berbeda-beda (multi-kombinasi). Isu-isu implementasi penggabungan 2 search engine dan 3 search engine juga menjadi sorotan. Pada makalah ini juga dibahas arahan penel...
Study of Search Engine Transaction Logs Shows Little Change in How Users use Search Engines. A review of: Jansen, Bernard J., and Amanda Spink. “How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Information Processing & Management 42.1 (2006: 248‐263.
Full Text Available Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American over a seven‐year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one term queries, operator (and, or, not, etc. usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.‐based or the European‐based search engines. As well, there was little change observed in the percentage of one‐term queries over the years of study for either the U.S.‐based or the European‐based search engines. Few users (generally less than 20% use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.‐based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, oneterm sessions and operator usage remained stable. The increase in single result page viewing
The changes in users' Web search performance using search engines over ten years was investigated in this study. Matched data obtained from samples in 2000 and 2010 were used for the comparative analysis. The patterns of Web search engine use suggested a dominance in using a particular search engine. Statistical ...
During 2016 data collection, the Compact Muon Solenoid Data Acquisition (CMS DAQ) system has shown a very good reliability. Nevertheless, the high complexity of the hardware and the software involved is, by its nature, prone to some occasional problems. As CMS subdetector, electromagnetic calorimeter (ECAL) is affected in the same way. Some of the issues are not predictable and can appear during the year more than once such as components getting noisy, power shortcuts or failing communication between machines. The chain detection-diagnosis-intervention must be as fast as possible to minimise the downtime of the detector. The aim of this project was to create a diagnostic software for ECAL crew, which consists of database and its web interface that allows to search, add and edit the contents of the database.
Nogler, M; Wimmer, C; Mayr, E; Ofner, D
This study has attempted to demonstrate the feasibility of obtaining information specific to foot and ankle orthopaedics from the World Wide Web (WWW). Six search engines (Lycos, AltaVista, Infoseek, Excite, Webcrawler, and HotBot) were used in scanning the Web for the following key words: "cavus foot," "diabetic foot," "hallux valgus,"and "pes equinovarus." Matches were classified by language, provider, type, and relevance to medical professionals or to patients. Sixty percent (407 sites) of the visited websites contained information intended for use by physicians and other medical professionals; 30% (206 sites) were related to patient information; 10% of the sites were not easily classifiable. Forty-one percent (169 sites) of the websites were commercially oriented homepages that included advertisements.
Prasanna, M D; Vondrasek, Jiri; Wlodawer, Alexander; Rodriguez, H; Bhat, T N
A novel technique to annotate, query, and analyze chemical compounds has been developed and is illustrated by using the inhibitor data on HIV protease-inhibitor complexes. In this method, all chemical compounds are annotated in terms of standard chemical structural fragments. These standard fragments are defined by using criteria, such as chemical classification; structural, chemical, or functional groups; and commercial, scientific or common names or synonyms. These fragments are then organized into a data tree based on their chemical substructures. Search engines have been developed to use this data tree to enable query on inhibitors of HIV protease (http://xpdb.nist.gov/hivsdb/hivsdb.html). These search engines use a new novel technique, Chemical Block Layered Alignment of Substructure Technique (Chem-BLAST) to search on the fragments of an inhibitor to look for its chemical structural neighbors. This novel technique to annotate and query compounds lays the foundation for the use of the Semantic Web concept on chemical compounds to allow end users to group, sort, and search structural neighbors accurately and efficiently. During annotation, it enables the attachment of "meaning" (i.e., semantics) to data in a manner that far exceeds the current practice of associating "metadata" with data by creating a knowledge base (or ontology) associated with compounds. Intended users of the technique are the research community and pharmaceutical industry, for which it will provide a new tool to better identify novel chemical structural neighbors to aid drug discovery. 2006 Wiley-Liss, Inc.
Nguyen, Dong; Demeester, Thomas; Trieschnigg, Dolf; Hiemstra, Djoerd
A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search engines, ranging from large general web search engines such as Google, Bing and Yahoo to small do...
Jackson, Joe; Gilstrap, Donald L.
Addresses the implications of the new Web metalanguage XML for searching on the World Wide Web and considers the future of XML on the Web. Compared to HTML, XML is more concerned with structure of data than documents, and these data structures should prove conducive to precise, context rich searching. (Author/LRW)
Mei, Kun; Yuan, Ying
While the electronically available information in the World-Wide Web is explosively growing and thus increasing, the difficulty to find relevant information is also increasing for search engine user. In this paper we discuss how to constrain web queries geographically. A number of search queries are associated with geographical locations, either explicitly or implicitly. Accurately and effectively detecting the locations where search queries are truly about has huge potential impact on increasing search relevance, bringing better targeted search results, and improving search user satisfaction. Our approach focus on both in the way geographic information is extracted from the web and, as far as we can tell, in the way it is integrated into query processing. This paper gives an overview of a spatially aware search engine for semantic querying of web document. It also illustrates algorithms for extracting location from web documents and query requests using the location ontologies to encode and reason about formal semantics of geographic web search. Based on a real-world scenario of tourism guide search, the application of our approach shows that the geographic information retrieval can be efficiently supported.
Sherman, Chris; Price, Gary
This book takes a detailed look at the nature and extent of the Invisible Web, and offers pathfinders for accessing the valuable information it contains. It is designed to fit the needs of both novice and advanced Web searchers. Chapter One traces the development of the Internet and many of the early tools used to locate and share information via…
Full Text Available This paper presents an English grammar and style checker for non-native English speakers. The main characteristic of this checker is the use of an Internet search engine. As the number of web pages written in English is immense, the system hypothesises that a piece of text not found on the Web is probably badly written. The system also hypothesises that the Web will provide examples of how the content of the text segment can be expressed in a grammatically correct and idiomatic way. Thus, when the checker warns the user about the odd nature of a text segment, the Internet engine searches for contexts that can help the user decide whether he/she should correct the segment or not. By means of a search engine, the checker also suggests use of other expressions that appear on the Web more often than the expression he/she actually wrote.
Takai, T; Tokunaga, M; Maeda, K; Kaminuma, T
As cyber space exploding in a pace that nobody has ever imagined, it becomes very important to search cyber space efficiently and effectively. One solution to this problem is search engines. Already a lot of commercial search engines have been put on the market. However these search engines respond with such cumbersome results that domain specific experts can not tolerate. Using a dedicate hardware and a commercial software called OpenText, we have tried to develop several domain specific search engines. These engines are for our institute's Web contents, drugs, chemical safety, endocrine disruptors, and emergent response for chemical hazard. These engines have been on our Web site for testing.
Sound search is provided by the major search engines, however, indexing is text based, not sound based. We will establish a dedicated sound search services with based on sound feature indexing. The current demo shows the concept of the sound search engine. The first engine will be realased June...
Examines the differences between Web image and textual queries, and attempts to develop an analytic model to investigate their implications for Web image retrieval systems. Provides results that give insight into Web image searching behavior and suggests implications for improvement of current Web image search engines. (AEF)
Casteleyn, Sven; Daniel, Florian; Dolog, Peter
Nowadays, Web applications are almost omnipresent. The Web has become a platform not only for information delivery, but also for eCommerce systems, social networks, mobile services, and distributed learning environments. Engineering Web applications involves many intrinsic challenges due...... to their distributed nature, content orientation, and the requirement to make them available to a wide spectrum of users who are unknown in advance. The authors discuss these challenges in the context of well-established engineering processes, covering the whole product lifecycle from requirements engineering through...
Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Hiemstra, Djoerd
The TREC Federated Web Search track is intended to promote research related to federated search in a realistic web setting, and hereto provides a large data collection gathered from a series of online search engines. This overview paper discusses the results of the first edition of the track, FedWeb
Mukhopadhyay, Debajyoti; Sharma, Manoj; Joshi, Gajanan; Pagare, Trupti; Palwe, Adarsha
Thinking of todays web search scenario which is mainly keyword based, leads to the need of effective and meaningful search provided by Semantic Web. Existing search engines are vulnerable to provide relevant answers to users query due to their dependency on simple data available in web pages. On other hand, semantic search engines provide efficient and relevant results as the semantic web manages information with well defined meaning using ontology. A Meta-Search engine is a search tool that ...
of its input RS results. 1. INTRODUCTION Federated Web Search is the task of searching multiple search engines simultaneously and combining their...or distributed properly. The goal of RS is then, for a given query, to select only the most promising search engines from all those available. Most...result pages of 149 search engines . 4000 queries are used in building the sample set. As a part of the Vertical Selection task, search engines are
Hannak, Aniko; Sapiezynski, Piotr; Kakhki, Arash Molavi
Web search is an integral part of our daily lives. Recently, there has been a trend of personalization in Web search, where different users receive different results for the same search query. The increasing personalization is leading to concerns about Filter Bubble effects, where certain users...... are simply unable to access information that the search engines’ algorithm decidesis irrelevant. Despitetheseconcerns, there has been little quantification of the extent of personalization in Web search today, or the user attributes that cause it. In light of this situation, we make three contributions....... First, we develop a methodology for measuring personalization in Web search results. While conceptually simple, there are numerous details that our methodology must handle in order to accurately attribute differences in search results to personalization. Second, we apply our methodology to 200 users...
Full Text Available Abstract Background A number of tools for the examination of linkage disequilibrium (LD patterns between nearby alleles exist, but none are available for quickly and easily investigating LD at longer ranges (>500 kb. We have developed a web-based query tool (GLIDERS: Genome-wide LInkage DisEquilibrium Repository and Search engine that enables the retrieval of pairwise associations with r2 ≥ 0.3 across the human genome for any SNP genotyped within HapMap phase 2 and 3, regardless of distance between the markers. Description GLIDERS is an easy to use web tool that only requires the user to enter rs numbers of SNPs they want to retrieve genome-wide LD for (both nearby and long-range. The intuitive web interface handles both manual entry of SNP IDs as well as allowing users to upload files of SNP IDs. The user can limit the resulting inter SNP associations with easy to use menu options. These include MAF limit (5-45%, distance limits between SNPs (minimum and maximum, r2 (0.3 to 1, HapMap population sample (CEU, YRI and JPT+CHB combined and HapMap build/release. All resulting genome-wide inter-SNP associations are displayed on a single output page, which has a link to a downloadable tab delimited text file. Conclusion GLIDERS is a quick and easy way to retrieve genome-wide inter-SNP associations and to explore LD patterns for any number of SNPs of interest. GLIDERS can be useful in identifying SNPs with long-range LD. This can highlight mis-mapping or other potential association signal localisation problems.
This tutorial provides an overview of several document engineering techniques which are applicable to the authoring of World Wide Web documents. It illustrates how pre-WWW hypertext research is applicable to the development of WWW information resources.
Zhuang, Yi; Jiang, Nan; Hu, Haiyang
In this paper, we propose a novel framework for the web-based retrieval of Chinese calligraphic manuscript images which includes two main components: 1). A Shape- Similarity (SS)-based method which is to effectively support a retrieval over large Chinese calligraphic manuscript databases . In this retrieval method, shapes of calligraphic characters are represented by their approximate contour points extracted from the character images; 2). To speed up the retrieval efficiency, we then propose a Composite - Distance- Tree(CD-Tree)-based high-dimensional indexing scheme for it. Comprehensive experiments are conducted to testify the effectiveness and efficiency of our proposed retrieval and indexing methods respectively.
DEMEESTER, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd; Hou, Yuexian; Nie, Jian-Yun; Sun, Le; Wang, Bo; Zhang, Peng
What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research questions from a global perspective. Our test collection covers the main Web search engines like Google, Yahoo!, and Bing, as well as a number of smaller search engines dedicated to multimedia, shopp...
In this study, we evaluate the performance of five semantic search engines that are available on the web, using 45 criteria, in the form of a researcher-made checklist. Criteria provided in the checklist included both common and semantic features. Common criteria or features are those applicable to all search engines and semantic ones are those only applicable to semantic search engines. Findings show that the selected search engines do not have suitable performance and expected efficiency. D...
Information and services on the web are accessible for everyone. Users of the web differ in their background, culture, political and social environment, interests and so on. Ambient intelligence was envisioned as a concept for systems which are able to adapt to user actions and needs. With the gr......Information and services on the web are accessible for everyone. Users of the web differ in their background, culture, political and social environment, interests and so on. Ambient intelligence was envisioned as a concept for systems which are able to adapt to user actions and needs...... suit the user profile the most. This paper summarizes the domain engineering framework for such adaptive web applications. The framework provides guidelines to develop adaptive web applications as members of a family. It suggests how to utilize the design artifacts as knowledge which can be used...
Jody Condit Fagan
Full Text Available Academic web search engines have become central to scholarly research. While the fitness of Google Scholar for research purposes has been examined repeatedly, Microsoft Academic and Google Books have not received much attention. Recent studies have much to tell us about the coverage and utility of Google Scholar, its coverage of the sciences, and its utility for evaluating researcher impact. But other aspects have been woefully understudied, such as coverage of the arts and humanities, books, and non-Western, non-English publications. User research has also tapered off. A small number of articles hint at the opportunity for librarians to become expert advisors concerning opportunities of scholarly communication made possible or enhanced by these platforms. This article seeks to summarize research concerning Google Scholar, Google Books, and Microsoft Academic from the past three years with a mind to informing practice and setting a research agenda. Selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas.
Faezeh sadat Tabatabai Amiri
Full Text Available The purpose of this research was to make exam the indexing and ranking of XML content objects containing Dublin Core and MARC 21 metadata elements in dynamic online information environments by general search engines and comparing them together in a comparative-analytical approach. 100 XML content objects in two groups were analyzed: those with DCXML elements and those with MARCXML elements were published in website http://www.marcdcmi.ir. from late Mordad 1388 till Khordad 1389. Then the website was introduced to Google and Yahoo search engines. Google search engine was able to retrieve fully all the content objects during the study period through their Dublin Core and MARC 21 metadata elements; Yahoo search engine, however, did not respond at all. The indexing of metadata elements embedded in content objects in dynamic online information environments and different between indexing and ranking of them were examined. Findings showed all Dublin Core and MARC 21 metadata elements by Google search engine were indexed. And there was not observed difference between indexing and ranking DCXML and MARCXML metadata elements in dynamic online information environments by Google search engine.
Demeester, Thomas; Trieschnigg, Rudolf Berend; Nguyen, Dong-Phuong; Zhou, Ke; Hiemstra, Djoerd
The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in FedWeb 2014, and we additionally introduced the task of vertical selection. Other new aspects are the required link between the Resource Selection and Results Merging, and the importance of diversi...
Smyth, Barry; Coyle, Maurice; Briggs, Peter
Web search engines are the primary means by which millions of users access information everyday and the sheer scale and success of the leading search engines is a testimony to the scientific and engineering progress that has been made over the last ten years. However, mainstream search engines continue to deliver largely one-size-fits-all services to their user-base, ultimately limiting the relevance of their result-lists. In this chapter we will explore recent research that is seeking to make Web search a more personal and collaborative experience as we look towards a new breed of more social search engines.
W. Mettrop (Wouter); P. Nieuwenhuysen
htmlabstractAn empirical investigation of the consistency of retrieval through Internet search engines is reported. Thirteen engines are evaluated: AltaVista, EuroFerret, Excite, HotBot, InfoSeek, Lycos, MSN, NorthernLight, Snap, WebCrawler and three national Dutch engines: Ilse, Search.nl and
This dissertation addresses semantic search of Web services using natural language processing. We first survey various existing approaches, focusing on the fact that the expensive costs of current semantic annotation frameworks result in limited use of semantic search for large scale applications. We then propose a vector space model based service…
Chernov, S.; Serdyukov, Pavel; Bender, M.; Michel, S.; Weikum, G.; Zimmer, C.
Intelligent Web search engines are extremely popular now. Currently, only the commercial centralized search engines like Google can process terabytes of Web data. Alternative search engines fulfilling collaborative Web search on a voluntary basis are usually based on a blooming Peer-to-Peer (P2P)
Full Text Available Search engines are playing a vital role in retrieving relevant information for the web user. In this research work a user profile based web search is proposed. So the web user from different domain may receive different set of results. The main challenging work is to provide relevant results at the right level of reading difficulty. Estimating user expertise and re-ranking the results are the main aspects of this paper. The retrieved results are arranged in Bookshelf Data Structure for easy access. Better presentation of search results hence increases the usability of web search engines significantly in visual mode.
English abstract]Quick increment of need on internet information resources leads to a rush of search engines. This article introduces some new type of search engines which is appearing and will appear. These search engines includes as follows: grey document search engine, invisible web search engine, knowledge discovery search engine, clustering meta search engine, academic clustering search engine, conception comparison and conception analogy search engine, consultation search engine, teachi...
Full Text Available The Internet becomes for most of us a daily used instrument, for professional or personal reasons. We even do not remember the times when a computer and a broadband connection were luxury items. More and more people are relying on the complicated web network to find the needed information.This paper presents an overview of Internet search related issues, upon search engines and describes the parties and the basic mechanism that is embedded in a search for web based information resources. Also presents ways to increase the efficiency of web searches, through a better understanding of what search engines ignore at websites content.
Kobayashi, Norio; Toyoda, Tetsuro
Statistical analysis of links on the Semantic Web is important for various evaluation purposes such as quantifying an individual's scientific research output based on citation links. SPARQL has been proposed as a standardized query language for the Semantic Web and is intuitively understandable; however, it does not adequately support statistical evaluation of semantic links. We have extended SPARQL to a novel Resource Description Framework (RDF) query language termed General and Rapid Association Study Query Language (GRASQL) to generate inferences connecting semantic Boolean-based deduction and statistical evaluation of RDF resources. We have verified the descriptive capability of GRASQL by writing GRASQL queries for practical biomedical search patterns including in silico positional cloning studies and for ranking researchers in a specific domain of expertise by introducing k index, the number of papers containing specific keywords that are published in a fixed period by a researcher. We have also developed a search engine termed General and Rapid Association Study Engine (GRASE), which executes a restricted variety of GRASQL queries by requesting a dynamic and comprehensive evaluation of statistical significance of intersections between each group of documents assigned to URIs and those documents matching user-specified keywords and omics conditions. By performing practical in silico positional cloning searches with GRASE, we show the relevance of our approach on the Semantic Web for biomedical knowledge discovery problem solving. GRASE is used as the search engine for the Positional Medline (PosMed) service and Researcher Finder service at http://omicspace.riken.jp/.
Discusses subject gateway sites and commercial search engines for the Web and presents an explanation of Google's PageRank algorithm. The principle question addressed is the conditions under which a gateway site will increase the likelihood that a target page is found in search engines. (LRW)
Chuklin, A.; Markov, I.; de Rijke, M.
With the rapid growth of web search in recent years the problem of modeling its users has started to attract more and more attention of the information retrieval community. This has several motivations. By building a model of user behavior we are essentially developing a better understanding of a
Kosta, Eleni; Kalloniatis, Christos; Mitrou, Lilian; Kavakli, Evangelia
Nowadays, Internet users are depending on various search engines in order to be able to find requested information on the Web. Although most users feel that they are and remain anonymous when they place their search queries, reality proves otherwise. The increasing importance of search engines for the location of the desired information on the Internet usually leads to considerable inroads into the privacy of users. The scope of this paper is to study the main privacy issues with regard to search engines, such as the anonymisation of search logs and their retention period, and to examine the applicability of the European data protection legislation to non-EU search engine providers. Ixquick, a privacy-friendly meta search engine will be presented as an alternative to privacy intrusive existing practices of search engines.
Discusses specialized search engines for information on specific topics on the Internet. Explains reasons for using specialized search engines; highlights search engines in the fields of health care, legal information, and multimedia formats; and provides Web sites that are helpful in finding other specialized search engines. (LRW)
Nanopoulos, Alexandros; Rafilidis, Dimitrios; Manolopoulos, Yannis
Nowadays we have a proliferation of music data available over the Web. One of the imperative challenges is how to search these vast, global-scale musical resources to find preferred music. Recent research has envisaged the notion of music search engines (MSEs) that allow for searching preferred...
Lee, Tae-Kyong; Chung, Hea-Jung; Park, Hye-Kyung; Lee, Eun-Ju; Nam, Hye-Seon; Jung, Soon-Im; Cho, Jee-Ye; Lee, Jin-Hee; Kim, Gon; Kim, Min-Chan
A diet habit, which is developed in childhood, lasts for a life time. In this sense, nutrition education and early exposure to healthy menus in childhood is important. Children these days have easy access to the internet. Thus, a web-based nutrition education program for children is an effective tool for nutrition education of children. This site provides the material of the nutrition education for children with characters which are personified nutrients. The 151 menus are stored in the site together with video script of the cooking process. The menus are classified by the criteria based on age, menu type and the ethnic origin of the menu. The site provides a search function. There are three kinds of search conditions which are key words, menu type and "between" expression of nutrients such as calorie and other nutrients. The site is developed with the operating system Windows 2003 Server, the web server ZEUS 5, development language JSP, and database management system Oracle 10 g. PMID:20126375
Van den Bosch, Antal; Bogers, Toine; De Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel...... method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indices over a nine-year period, from March 2006...... until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find...
Notess, Greg R.
Few have the resources to build a Google or Yahoo! from scratch. Yet anyone can build a search engine based on a subset of the large search engines' databases. Use Google Custom Search Engine or Yahoo! Search Builder or any of the other similar programs to create a vertical search engine targeting sites of interest to users. The basic steps to…
Abudaqqa, Yousra; Patel, Ahmed
Indisputably, search engines (SEs) abound. The monumental growth of users performing online searches on the Web is a contending issue in the contemporary world nowadays. For example, there are tens of billions of searches performed everyday, which typically offer the users many irrelevant results which are time consuming and costly to the user. Based on the afore-going problem it has become a herculean task for existing Web SEs to provide complete, relevant and up-to-date information response to users' search queries. To overcome this problem, we developed the Distributed Search Engine Architecture (DSEA), which is a new means of smart information query and retrieval of the World Wide Web (WWW). In DSEAs, multiple autonomous search engines, owned by different organizations or individuals, cooperate and act as a single search engine. This paper includes the work reported in this research focusing on development of DSEA, based on topic-specific specialised search engines. In DSEA, the results to specific queries could be provided by any of the participating search engines, for which the user is unaware of. The important design goal of using topic-specific search engines in the research is to build systems that can effectively be used by larger number of users simultaneously. Efficient and effective usage with good response is important, because it involves leveraging the vast amount of searched data from the World Wide Web, by categorising it into condensed focused topic -specific results that meet the user's queries. This design model and the development of the DSEA adopt a Service Directory (SD) to route queries towards topic-specific document hosting SEs. It displays the most acceptable performance which is consistent with the requirements of the users. The evaluation results of the model return a very high priority score which is associated with each frequency of a keyword.
Guha, Neel; Ozdalga, Errol; Wytock, Matthew
International audience; Users struggle with keyword based search engines like Google or Bing because queries can have multiple interpretations and search engines fail to understand the context in which the user is looking for information. This failure leads to search results that are either inappropriate or contextually irrelevant. In this paper we describe algorithms which, utilizing information about the user's context, scrape the web and process/filter candidate sites that could be used to...
Due to search engines' automated operations, people often assume that search engines display search results neutrally and without bias. However, this perception is mistaken. Like any other media company, search engines affirmatively control their users' experiences, which has the consequence of skewing search results (a phenomenon called "search engine bias"). Some commentators believe that search engine bias is a defect requiring legislative correction. Instead, this chapter argues that search engine bias is the beneficial consequence of search engines optimizing content for their users. The chapter further argues that the most problematic aspect of search engine bias, the "winner-take-all" effect caused by top placement in search results, will be mooted by emerging personalized search technology.
Ott, Niels; Meurers, Detmar
Search engines have been a major factor in making the web the successful and widely used information source it is today. Generally speaking, they make it possible to retrieve web pages on a topic specified by the keywords entered by the user. Yet web searching currently does not take into account which of the search results are comprehensible for…
Since the creation of the first pre-Web Internet search engines in the early 1990s, search engines have become almost as important as email as a primary online activity. Arguably, search engines are among the most important gatekeepers in today's digitally networked environment. Thus, it does not come as a surprise that the evolution of search technology and the diffusion of search engines have been accompanied by a series of conflicts among stakeholders such as search operators, content crea...
Taiwo O. Edosomwan
Full Text Available We compared the information retrieval performances of some popular search engines (namely, Google, Yahoo, AlltheWeb, Gigablast, Zworks and AltaVista and Bing/MSN in response to a list of ten queries, varying in complexity. These queries were run on each search engine and the precision and response time of the retrieved results were recorded. The first ten documents on each retrieval output were evaluated as being ‘relevant’ or ‘non-relevant’ for evaluation of the search engine’s precision. To evaluate response time, normalised recall ratios were calculated at various cut-off points for each query and search engine. This study shows that Google appears to be the best search engine in terms of both average precision (70% and average response time (2 s. Gigablast and AlltheWeb performed the worst overall in this study.
Noll, Michael G.; Meinel, Christoph
In this paper, we present a new approach to web search personalization based on user collaboration and sharing of information about web documents. The proposed personalization technique separates data collection and user profiling from the information system whose contents and indexed documents are being searched for, i.e. the search engines, and uses social bookmarking and tagging to re-rank web search results. It is independent of the search engine being used, so users are free to choose the one they prefer, even if their favorite search engine does not natively support personalization. We show how to design and implement such a system in practice and investigate its feasibility and usefulness with large sets of real-word data and a user study.
Full Text Available This paper describes an investigation of the external search engine optimization (SEO action planning tool, dedicated to automatically extract a small set of most important keywords for each month during whole year period. The keywords in the set are extracted accordingly to external measured parameters, such as average number of searches during the year and for every month individually. Additionally the position of the optimized web site for each keyword is taken into account. The generated optimization plan is similar to the optimization plans prepared manually by the SEO professionals and can be successfully used as a support tool for web site search engine optimization.
Fernández Sánchez, Miriam; López, Vanessa; Sabou, Marta; Uren, Victoria S; Vallet Weadon, David Jordi; Motta, Enrico; Castells, Pablo
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. M. Fernández, V. López, M. Sabou, V. S. Uren, D. Vallet, E. Motta, and P. Castells, "Semantic search meets the Web", 2008 IE...
Full Text Available Background: In spite of the enormous amount of information available on the Web and the fact that search engines are continuously evolving to enhance the search experience, students are nevertheless faced with the difficulty of effectively retrieving information. It is, therefore, imperative for the interaction between students and search tools to be understood and search strategies to be identified, in order to promote successful information retrieval. Objectives: This study identifies the Web search strategies used by postgraduate students and forms part of a wider study into information retrieval strategies used by postgraduate students at the University of KwaZulu-Natal (UKZN, Pietermaritzburg campus, South Africa. Method: Largely underpinned by Thatcher’s cognitive search strategies, the mixed-methods approach was utilised for this study, in which questionnaires were employed in Phase 1 and structured interviews in Phase 2. This article reports and reflects on the findings of Phase 2, which focus on identifying the Web search strategies employed by postgraduate students. The Phase 1 results were reported in Civilcharran, Hughes and Maharaj (2015. Results: Findings reveal the Web search strategies used for academic information retrieval. In spite of easy access to the invisible Web and the advent of meta-search engines, the use of Web search engines still remains the preferred search tool. The UKZN online library databases and especially the UKZN online library, Online Public Access Catalogue system, are being underutilised. Conclusion: Being ranked in the top three percent of the world’s universities, UKZN is investing in search tools that are not being used to their full potential. This evidence suggests an urgent need for students to be trained in Web searching and to have a greater exposure to a variety of search tools. This article is intended to further contribute to the design of undergraduate training programmes in order to deal
Full Text Available Background: In spite of the enormous amount of information available on the Web and the fact that search engines are continuously evolving to enhance the search experience, students are nevertheless faced with the difficulty of effectively retrieving information. It is, therefore, imperative for the interaction between students and search tools to be understood and search strategies to be identified, in order to promote successful information retrieval.Objectives: This study identifies the Web search strategies used by postgraduate students and forms part of a wider study into information retrieval strategies used by postgraduate students at the University of KwaZulu-Natal (UKZN, Pietermaritzburg campus, South Africa. Method: Largely underpinned by Thatcher’s cognitive search strategies, the mixed-methods approach was utilised for this study, in which questionnaires were employed in Phase 1 and structured interviews in Phase 2. This article reports and reflects on the findings of Phase 2, which focus on identifying the Web search strategies employed by postgraduate students. The Phase 1 results were reported in Civilcharran, Hughes and Maharaj (2015.Results: Findings reveal the Web search strategies used for academic information retrieval. In spite of easy access to the invisible Web and the advent of meta-search engines, the use of Web search engines still remains the preferred search tool. The UKZN online library databases and especially the UKZN online library, Online Public Access Catalogue system, are being underutilised.Conclusion: Being ranked in the top three percent of the world’s universities, UKZN is investing in search tools that are not being used to their full potential. This evidence suggests an urgent need for students to be trained in Web searching and to have a greater exposure to a variety of search tools. This article is intended to further contribute to the design of undergraduate training programmes in order to deal
The book is composed of two main parts. The first part is a general study of Semantic Web Search. The second part specifically focuses on the use of semantics throughout the search process, compiling a big picture of Process-oriented Semantic Web Search from different pieces of work that target specific aspects of the process.In particular, this book provides a rigorous account of the concepts and technologies proposed for searching resources and semantic data on the Semantic Web. To collate the various approaches and to better understand what the notion of Semantic Web Search entails, this bo
W. Mettrop (Wouter); P. Nieuwenhuysen; H. Smulders
textabstractA significant portion of the computer files that carry documents, multimedia, programs etc. on the Web are identical or very similar to other files on the Web. How do search engines cope with this? Do they perform some kind of “deduplication”? How should users take into account that
Because the World Wide Web is a dynamic collection of information, the Web search tools (or "search engines") that index the Web are dynamic. Traditional information retrieval evaluation techniques may not provide reliable results when applied to the Web search tools. This study is the result of ten replications of the classic 1996 Ding and Marchionini Web search tool research. It explores the effects that replication can have on transforming unreliable results from one iteration into replica...
Müller, Henning; Boyer, Célia; Gaudinat, Arnaud; Hersh, William; Geissbuhler, Antoine
Medical institutions produce ever-increasing amount of diverse information. The digital form makes these data available for the use on more than a single patient. Images are no exception to this. However, less is known about how medical professionals search for visual medical information and how they want to use it outside of the context of a single patient. This article analyzes ten months of usage log files of the Health on the Net (HON) medical media search engine. Key words were extracted from all queries and the most frequent terms and subjects were identified. The dataset required much pre-treatment. Problems included national character sets, spelling errors and the use of terms in several languages. The results show that media search, particularly for images, was frequently used. The most common queries were for general concepts (e.g., heart, lung). To define realistic information needs for the ImageCLEFmed challenge evaluation (Cross Language Evaluation Forum medical image retrieval), we used frequent queries that were still specific enough to at least cover two of the three axes on modality, anatomic region, and pathology. Several research groups evaluated their image retrieval algorithms based on these defined topics.
Dragusin, Radu; Petcu, Paula; Lioma, Christina Amalia
Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic...... approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, state-of-the-art evaluation measures, and curated information resources. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source...... medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Conclusions: Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular web search engines. The proposed...
Húsek, Dušan; Keyhanipour, A.; Krömer, P.; Moshiri, B.; Owais, S.; Snášel, V.
Roč. 33, - (2008), s. 189-200 ISSN 1870-4069 Institutional research plan: CEZ:AV0Z10300504 Keywords : web search * meta- search engine * intelligent re-ranking * ordered weighted averaging * Boolean search queries optimizing Subject RIV: IN - Informatics, Computer Science
Demeester, Thomas; Nguyen, Dong-Phuong; Trieschnigg, Rudolf Berend; Develder, Chris; Hiemstra, Djoerd; Hou, Yuexian; Nie, Jian-Yun; Sun, Le; Wang, Bo; Zhang, Peng
What is the likelihood that a Web page is considered relevant to a query, given the relevance assessment of the corresponding snippet? Using a new federated IR test collection that contains search results from over a hundred search engines on the internet, we are able to investigate such research
Kaptein, A.M.; Broek, E.L. van den; Koot, G.; Huis in't Veld, M.A.A.
Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic
Kaptein, Rianne; van den Broek, Egon; Koot, Gijs; Huis in 't Veld, Mirjam A.A.; Bennett, P.N.; Gabrilovich, E.; Kamps, J.; Karlgren, J.
Web search engines are optimized for early precision, which makes it difficult to perform recall oriented tasks with them. In this article, we propose several ways to leverage semantic annotations and, thereby, increase the efficiency of recall oriented search tasks, with a focus on forensic
With advancement in technology, many individuals are getting familiar with the internet a lot of users seek for information on the World Wide Web (WWW) using variety of search engines. This research work evaluates the retrieval effectiveness of Google, Yahoo, Bing, AOL and Baidu. Precision, relative recall and response ...
Sayed Rabeh Sayed
A Research about multimedia search engines, it starts with definition of search engines at general and multimedia search engines, then explains how they work, and divided them into: Video search engines, Images search engines, and Audio search engines. Finally, it reviews a samples to multimedia search engines.
Search engine optimization (SEO) techniques involve „on-page“ and „off-page“ actions taken by web developers and SEO specialists with aim to increase the ranking of web pages in search engine results pages (SERP...
Augustine, Susan; Greene, Courtney
Discusses results of a usability study at the University of Illinois Chicago that investigated whether Internet search engines have influenced the way students search library Web sites. Results show students use the Web site's internal search engine rather than navigating through the pages; have difficulty interpreting library terminology; and…
Rajashree Shettar; Rahul Bhuptani
The World Wide Web is growing exponentially and the dynamic, unstructured nature of the web makes it difficult to locate useful resources. Web Search engines such as Google and Alta Vista provide huge amount of information many of which might not be relevant to the users query. In this paper, we build a vertical search engine which takes a seed URL and classifies the URLs crawled as Medical or Finance domains. The filter component of the vertical search engine classifies the web pages downloa...
Volkovich, Y.; Litvak, Nelli
PageRank with personalization is used in Web search as an importance measure for Web documents. The goal of this paper is to characterize the tail behavior of the PageRank distribution in the Web and other complex networks characterized by power laws. To this end, we model the PageRank as a solution
This proposal identifies two main problems related to deep web search, and proposes a step by step solution for each of them. The first problem is about searching deep web content by means of a simple free-text interface (with just one input field, instead of a complex interface with many input
Tjin-Kam-Jet, Kien; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
We review the state-of-the-art in deep web search and propose a novel classification scheme to better compare deep web search systems. The current binary classification (surfacing versus virtual integration) hides a number of implicit decisions that must be made by a developer. We make these
Jácome, Alberto G; Fdez-Riverola, Florentino; Lourenço, Anália
Xie, M.; Wang, H.; Goh, T. N.
Reviews commonly used search engines (AltaVista, Excite, infoseek, Lycos, HotBot, WebCrawler), focusing on existing comparative studies; considers quality dimensions from the customer's point of view based on a SERVQUAL framework; and groups these quality expectations in five dimensions: tangibles, reliability, responsiveness, assurance, and…
Lin, Xiaofan; Gokturk, Burak; Sumengen, Baris; Vu, Diem
Nowadays there are many product comparison web sites. But most of them only use text information. This paper introduces a novel visual search engine for product images, which provides a brand-new way of visually locating products through Content-based Image Retrieval (CBIR) technology. We discusses the unique technical challenges, solutions, and experimental results in the design and implementation of this system.
, Google and ask.com search engines. Experimental method was used with ten reference questions which were used to query each of the search engines . Yahoo obtained the highest results (521,801,043) among the three Web search ...
As we use the Web for social networking, shopping, and news, we leave a personal trail. These days, linger over a Web page selling lamps, and they will turn up at the advertising margins as you move around the Internet, reminding you, tempting you to make that purchase. Search engines such as Google can now look deep into the data on the Web to pull out instances of the words you are looking for. And there are pages that collect and assess information to give you a snapshot ofchanging political opinion. These are just basic examples of the growth of ""Web intelligence"", as increasingly sophis
Baitaluk, Michael; Kozhenkov, Sergey; Dubinina, Yulia; Ponomarenko, Julia
With the growth of biological data in volume and heterogeneity, web search engines become key tools for researchers. However, general-purpose search engines are not specialized for the search of biological data. Here, we present an approach at developing a biological web search engine based on the Semantic Web technologies and demonstrate its implementation for retrieving gene- and protein-centered knowledge. The engine is available at http://www.integromedb.org. The IntegromeDB search engine allows scanning data on gene regulation, gene expression, protein-protein interactions, pathways, metagenomics, mutations, diseases, and other gene- and protein-related data that are automatically retrieved from publicly available databases and web pages using biological ontologies. To perfect the resource design and usability, we welcome and encourage community feedback.
Purpose - To test major Web search engines on their performance on navigational queries, i.e. searches for homepages. Design/methodology/approach - 100 real user queries are posed to six search engines (Google, Yahoo, MSN, Ask, Seekport, and Exalead). Users described the desired pages, and the results position of these is recorded. Measured success N and mean reciprocal rank are calculated. Findings - Performance of the major search engines Google, Yahoo, and MSN is best, with around 90 perce...
How does a search engine such as Google know which search results to display? There are many competing algorithms that generate search results, but which one works best? We developed a new probabilistic method for quickly comparing large numbers of search algorithms by examining the results users
Williams, Sarah C.
The purpose of this study was to investigate how federated search engines are incorporated into the Web sites of libraries in the Association of Research Libraries. In 2009, information was gathered for each library in the Association of Research Libraries with a federated search engine. This included the name of the federated search service and…
Rowe, Will; Baker, Kate S; Verner-Jeffreys, David; Baker-Austin, Craig; Ryan, Jim J; Maskell, Duncan; Pearce, Gareth
Antimicrobial resistance remains a growing and significant concern in human and veterinary medicine. Current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria are limited in their effectiveness and scope. With the rapidly developing field of whole genome sequencing beginning to be utilised in clinical practice, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes will become increasingly important and useful for informing clinical decisions. Additionally, use of such tools will provide insight into the dynamics of antimicrobial resistance genes in metagenomic samples such as those used in environmental monitoring. Here we present the Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. The pipeline provides gene information, abundance estimation and the reconstructed sequence of antimicrobial resistance genes; it also provides web links to additional information on each gene. The pipeline utilises clustering and read mapping to annotate full-length genes relative to a user-defined database. It also uses local alignment of annotated genes to a range of online databases to provide additional information. We demonstrate SEAR's application in the detection and abundance estimation of antimicrobial resistance genes in two novel environmental metagenomes, 32 human faecal microbiome datasets and 126 clinical isolates of Shigella sonnei. We have developed a pipeline that contributes to the improved capacity for antimicrobial resistance detection afforded by next generation sequencing technologies, allowing for rapid detection of antimicrobial resistance genes directly from sequencing data. SEAR uses raw sequencing data via an intuitive interface so can be run rapidly without requiring advanced bioinformatic skills or resources. Finally, we show that SEAR
Full Text Available Antimicrobial resistance remains a growing and significant concern in human and veterinary medicine. Current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria are limited in their effectiveness and scope. With the rapidly developing field of whole genome sequencing beginning to be utilised in clinical practice, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes will become increasingly important and useful for informing clinical decisions. Additionally, use of such tools will provide insight into the dynamics of antimicrobial resistance genes in metagenomic samples such as those used in environmental monitoring.Here we present the Search Engine for Antimicrobial Resistance (SEAR, a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. The pipeline provides gene information, abundance estimation and the reconstructed sequence of antimicrobial resistance genes; it also provides web links to additional information on each gene. The pipeline utilises clustering and read mapping to annotate full-length genes relative to a user-defined database. It also uses local alignment of annotated genes to a range of online databases to provide additional information. We demonstrate SEAR's application in the detection and abundance estimation of antimicrobial resistance genes in two novel environmental metagenomes, 32 human faecal microbiome datasets and 126 clinical isolates of Shigella sonnei.We have developed a pipeline that contributes to the improved capacity for antimicrobial resistance detection afforded by next generation sequencing technologies, allowing for rapid detection of antimicrobial resistance genes directly from sequencing data. SEAR uses raw sequencing data via an intuitive interface so can be run rapidly without requiring advanced bioinformatic skills or resources. Finally, we
Descy, Don E.
There are hundreds of unique search engines in the United States and thousands of unique search engines around the world. If people get into search engines designed just to search particular web sites, the number is in the hundreds of thousands. This article looks at: (1) clustering search engines, such as KartOO (www.kartoo.com) and Grokker…
The change occuring related to “search engines” is going towards e-commerce, transforming all the main search engines into information and commercial suggestion conveying means, basing their businnes on this activity. In a next future we will find two main series of search engines: from one side, the portals that will offer a general orientation guide being convoying means for services and to-buy products; from the other side, vertical portals able to offer information and products on specifi...
Full Text Available This paper presents findings from a study of the effects of query structure on retrieval by Web search services. Fifteen queries were selected from the transaction log of a major Web search service in simple query form with no advanced operators (e.g., Boolean operators, phrase operators, etc. and submitted to 5 major search engines - Alta Vista, Excite, FAST Search, Infoseek, and Northern Light. The results from these queries became the baseline data. The original 15 queries were then modified using the various search operators supported by each of the 5 search engines for a total of 210 queries. Each of these 210 queries was also submitted to the applicable search service. The results obtained were then compared to the baseline results. A total of 2,768 search results were returned by the set of all queries. In general, increasing the complexity of the queries had little effect on the results with a greater than 70% overlap in results, on average. Implications for the design of Web search services and directions for future research are discussed.
Kumar, Naresh, Dr.
The size of WWW is growing exponentially with ever change in technology. This results in huge amount of information with long list of URLs. Manually it is not possible to visit each page individually. So, if the page ranking algorithms are used properly then user search space can be restricted up to some pages of searched results. But available literatures show that no single search system can provide qualitative results from all the domains. This paper provides solution to this problem by introducing a new meta search engine that determine the relevancy of query corresponding to web page and cluster the results accordingly. The proposed approach reduces the user efforts, improves the quality of results and performance of the meta search engine.
Bosch, A.P.J. van den; Bogers, T.; Kunder, M. de
One of the determining factors of the quality of Web search engines is the size and quality of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We
Bøhlen, Michael Hanspeter; Bukauskas, Linas; Dyreson, Curtis
Information spread in in databases cannot be found by current search engines. A database search engine is capable to access and advertise database on the WWW. Jungle is a database search engine prototype developed at Aalborg University. Operating through JDBC connections to remote databases, Jungle...... extracts and indexes database data and meta-data, building a data store of database information. This information is used to evaluate and optimize queries in the AQUA query language. AQUA is a natural and intuitive database query language that helps users to search for information without knowing how...
Full Text Available Web Search Engines provide a huge number of answers in response to a user query, many of which are not relevant, whereas some of the most relevant ones may not be found. In the literature several approaches have been proposed in order to help a user to find the information relevant to his/her real needs on the Web. To achieve this goal the individual Information Behavior can been analyzed to ’keep’ track of the user’s interests. Keeping information is a type of Information Behavior, and in several works researchers have referred to it as the study on what people do during a search on the Web. Generally, the user’s actions (e.g., how the user moves from one Web page to another, or her/his download of a document, etc. are recorded in Web logs. This paper reports on research activities which aim to exploit the information extracted from Web logs (or query logs in personalized user ontologies, with the objective to support the user in the process of discovering Web information relevant to her/his information needs. Personalized ontologies are used to improve the quality of Web search by applying two main techniques: query reformulation and re-ranking of query evaluation results. In this paper we analyze various methodologies presented in the literature aimed at using personalized ontologies, defined on the basis of the observation of Information Behaviour to help the user in finding relevant information.
Rushton, Erin E.; Kelehan, Martha Daisy; Strong, Marcy A.
Search engine use is one of the most popular online activities. According to a recent OCLC report, nearly all students start their electronic research using a search engine instead of the library Web site. Instead of viewing search engines as competition, however, librarians at Binghamton University Libraries decided to employ search engine…
Search engines have now become essential tools in all aspects of our life. Although a variety of information needs can be served very successfully, there are still a lot of queries that search engines can not answer very effectively and these queries always make users feel frustrated. Since it is quite often that users encounter such "difficult…
Morris, Meredith Ringel
Today, Web search is treated as a solitary experience. Web browsers and search engines are typically designed to support a single user, working alone. However, collaboration on information-seeking tasks is actually commonplace. Students work together to complete homework assignments, friends seek information about joint entertainment opportunities, family members jointly plan vacation travel, and colleagues jointly conduct research for their projects. As improved networking technologies and the rise of social media simplify the process of remote collaboration, and large, novel display form-fac
In recent years search engines have become an essential tool in the work of physicians. This article will review advanced search techniques from the world of information specialists, as well as some advanced search engine operators that may help physicians improve their online search capabilities, and maximize the yield of their searches. This article also reviews popular dedicated scientific and biomedical literature search engines.
Reichow, Brian; Naples, Adam; Steinhoff, Timothy; Halpern, Jason; Volkmar, Fred R.
The World Wide Web is one of the most common methods used by parents to find information on autism spectrum disorders and most consumers find information through search engines such as Google or Bing. However, little is known about how the search engines operate or the consistency of the results that are returned over time. This study presents the…
Lau, Annie Y S; Coiera, Enrico; Zrimec, Tatjana; Compton, Paul
Searching the Web for documents using information retrieval systems plays an important part in clinicians' practice of evidence-based medicine. While much research focuses on the design of methods to retrieve documents, there has been little examination of the way different search engine capabilities influence clinician search behaviors. Previous studies have shown that use of task-based search engines allows for faster searches with no loss of decision accuracy compared with resource-based engines. We hypothesized that changes in search behaviors may explain these differences. In all, 75 clinicians (44 doctors and 31 clinical nurse consultants) were randomized to use either a resource-based or a task-based version of a clinical information retrieval system to answer questions about 8 clinical scenarios in a controlled setting in a university computer laboratory. Clinicians using the resource-based system could select 1 of 6 resources, such as PubMed; clinicians using the task-based system could select 1 of 6 clinical tasks, such as diagnosis. Clinicians in both systems could reformulate search queries. System logs unobtrusively capturing clinicians' interactions with the systems were coded and analyzed for clinicians' search actions and query reformulation strategies. The most frequent search action of clinicians using the resource-based system was to explore a new resource with the same query, that is, these clinicians exhibited a "breadth-first" search behaviour. Of 1398 search actions, clinicians using the resource-based system conducted 401 (28.7%, 95% confidence interval [CI] 26.37-31.11) in this way. In contrast, the majority of clinicians using the task-based system exhibited a "depth-first" search behavior in which they reformulated query keywords while keeping to the same task profiles. Of 585 search actions conducted by clinicians using the task-based system, 379 (64.8%, 95% CI 60.83-68.55) were conducted in this way. This study provides evidence that
Full Text Available In this study, we evaluate the performance of five semantic search engines that are available on the web, using 45 criteria, in the form of a researcher-made checklist. Criteria provided in the checklist included both common and semantic features. Common criteria or features are those applicable to all search engines and semantic ones are those only applicable to semantic search engines. Findings show that the selected search engines do not have suitable performance and expected efficiency. DuckDuckGo, has the most points, considering regular features. Cluuz is in the second place with 20 points and Hakia with 18 points was in the third place. Lexxe and Factbites, with scores of 15 and 10 were placed in the next categories in order of their points. In semantic features, DuckDuckGo, with 10/65 points was in the first place. Hakia with 9/99 points was in the second place, and then the search engines Cluuz with 8/66 Points, Lexxe with 8/65 points and Factbites with 7/32 points were allocated to the next levels. The research results also indicated that on the whole, considering ordinary and semantic features, DuckDuckGo with 31/65 points, Cluuz with 28/66, Hakia with 27/99 points, Lexxe with 23/65 points and Factbites with 17/32 points, got the highest scores out of it.
de Corniere, A; Taylor, G.
Competition authorities all over the world worry that integration between search engines (mainly Google) and publishers could lead to abuses of dominant position. In particular, one concern is that of own-content bias, meaning that Google would bias its rankings in favor of the publishers it owns or has an interest in, to the detriment of competitors and users. In order to investigate this issue, we develop a theoretical framework in which the search engine (i) allocates users across publishe...
Lianos, I.; Motchenkova, E.I.
We analyze a search engine market from a law and economics perspective and incorporate the choice of quality-improving innovations by a search engine platform in a two-sided model of Internet search engine. In the proposed framework, we first discuss the legal issues the search engine market raises
Hendry, D. G.; Efthimiadis, E. N.
Search engines have entered popular culture. They touch people in diverse private and public settings and thus heighten the importance of such important social matters as information privacy and control, censorship, and equitable access. To fully benefit from search engines and to participate in debate about their merits, people necessarily appeal to their understandings for how they function. In this chapter we examine the conceptual understandings that people have of search engines by performing a content analysis on the sketches that 200 undergraduate and graduate students drew when asked to draw a sketch of how a search engine works. Analysis of the sketches reveals a diverse range of conceptual approaches, metaphors, representations, and misconceptions. On the whole, the conceptual models articulated by these students are simplistic. However, students with higher levels of academic achievement sketched more complete models. This research calls attention to the importance of improving students' technical knowledge of how search engines work so they can be better equipped to develop and advocate policies for how search engines should be embedded in, and restricted from, various private and public information settings.
Van den Bosch, Antal; Bogers, Toine; De Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel...... method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indexes over a nine-year period, from March 2006...... until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find...
Full Text Available Search engines must keep an up-to-date image to all Web pages and other web resources hosted in web servers in their index and data repositories, to provide better and accurate results to its clients. The crawlers of these search engines have to retrieve the pages continuously to keep the index up-to-date. It is reported in the literature that 40% of the current Internet traffic and bandwidth consumption is due to these crawlers. So we are interested in detecting the significant changes in web pages which reflect effectively in search engine’s index and minimize the network load. In this paper, we suggest a document index based change detection technique and distributed indexing using mobile agents. The experimental results have shown that the proposed system can considerably reduce the network traffic and the computational load on the search engine side and keep its index up-to-date with significant changes.
Büttcher, Stefan; Cormack, Gordon V
Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus -- a multiuser open-source information retrieval system developed by one of the authors and available online -- provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.
Mashup based content search engine for mobile devices is proposed. Example of the proposed search engine is implemented with Yahoo!JAPAN Web SearchAPI, Yahoo!JAPAN Image searchAPI, YouTube Data API, and Amazon Product Advertising API. The retrieved results are also merged and linked each other. Therefore, the different types of contents can be referred once an e-learning content is retrieved. The implemented search engine is evaluated with 20 students. The results show usefulness and effectiv...
Ms. Leena More
The concept of Information Retrieval is very vast and too many models of search engines are available in the market. In this research various information retrieval techniques used in search engine were studies and modified model of search engine were developed. In web mining most of the web search engines retrieve the documents or information first without knowing the meaning of the keyword and then ask for the relevant meaning of the keyword entered by the users. That means without understan...
Bracke, Paul J; Howse, David K; Keim, Samuel M
This paper reports on the development of a tool by the Arizona Health Sciences Library (AHSL) for searching clinical evidence that can be customized for different user groups. The AHSL provides services to the University of Arizona's (UA's) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM) Search, that provides users with a simple search interface to EBM resources and presents results organized according to an evidence pyramid. EBM Search was developed with a web-based configuration component that allows the tool to be customized for different specialties. Informal and anecdotal feedback from physicians indicates that EBM Search is a useful tool with potential in teaching evidence-based decision making. While formal evaluation is still being planned, a tool such as EBM Search, which can be configured for specific user populations, may help lower barriers to information resources in an academic health sciences center.
Ghobrial, George M; Mehdi, Angud; Maltenfort, Mitchell; Sharan, Ashwini D; Harrop, James S
Patients are increasingly reliant upon the Internet as a primary source of medical information. The educational experience varies by search engine, search term, and changes daily. There are no tools for critical evaluation of spinal surgery websites. To highlight the variability between common search engines for the same search terms. To detect bias, by prevalence of specific kinds of websites for certain spinal disorders. Demonstrate a simple scoring system of spinal disorder website for patient use, to maximize the quality of information exposed to the patient. Ten common search terms were used to query three of the most common search engines. The top fifty results of each query were tabulated. A negative binomial regression was performed to highlight the variation across each search engine. Google was more likely than Bing and Yahoo search engines to return hospital ads (P=0.002) and more likely to return scholarly sites of peer-reviewed lite (P=0.003). Educational web sites, surgical group sites, and online web communities had a significantly higher likelihood of returning on any search, regardless of search engine, or search string (P=0.007). Likewise, professional websites, including hospital run, industry sponsored, legal, and peer-reviewed web pages were less likely to be found on a search overall, regardless of engine and search string (P=0.078). The Internet is a rapidly growing body of medical information which can serve as a useful tool for patient education. High quality information is readily available, provided that the patient uses a consistent, focused metric for evaluating online spine surgery information, as there is a clear variability in the way search engines present information to the patient. Published by Elsevier B.V.
Sanders, D; Jones, C.G.; Thébault, E.; Bouma, T. J.; van der Heide, T.; van Belzen, J.; S. Barot
Ecosystem engineering, the physical modification of the environment by organisms, is a common and often influential process whose significance to food web structure and dynamics is largely unknown. In the light of recent calls to expand food web studies to include non-trophic interactions, we explore how we might best integrate ecosystem engineering and food webs. We provide rationales justifying their integration and present a provisional framework identifying how ecosystem engineering can a...
Badawi, Marwa; Mohamed, Ammar; Hussein, Ahmed; Gheith, Mervat
Search engines must keep an up-to-date image to all Web pages and other web resources hosted in web servers in their index and data repositories, to provide better and accurate results to its clients. The crawlers of these search engines have to retrieve the pages continuously to keep the index up-to-date. It is reported in the literature that 40% of the current Internet traffic and bandwidth consumption is due to these crawlers. So we are interested in detecting the significant changes in we...
Examines search tools available to elementary and secondary school students, both human-compiled and crawler-based, to help direct them to age-appropriate Web sites; analyzes the procedures of search engines labeled family-friendly or kid safe that use filters; and tests the effectiveness of these services to students in school libraries. (LRW)
Sessink, O.D.T.; Schaaf, van der H.; Beeftink, H.H.; Hartog, R.; Tramper, J.
The combination of web technology, knowledge of bioprocess engineering, and theories on learning and instruction might yield innovative learning material for bioprocess engineering. In this article, an overview of the characteristics of web-based learning material is given, as well as guidelines for
Goel, Sharad; Hofman, Jake M; Lahaie, Sébastien; Pennock, David M; Watts, Duncan J
Recent work has demonstrated that Web search volume can "predict the present," meaning that it can be used to accurately track outcomes such as unemployment levels, auto and home sales, and disease prevalence in near real time. Here we show that what consumers are searching for online can also predict their collective future behavior days or even weeks in advance. Specifically we use search query volume to forecast the opening weekend box-office revenue for feature films, first-month sales of video games, and the rank of songs on the Billboard Hot 100 chart, finding in all cases that search counts are highly predictive of future outcomes. We also find that search counts generally boost the performance of baseline models fit on other publicly available data, where the boost varies from modest to dramatic, depending on the application in question. Finally, we reexamine previous work on tracking flu trends and show that, perhaps surprisingly, the utility of search data relative to a simple autoregressive model is modest. We conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries provide a useful guide to the near future.
Dragusin, Radu; Petcu, Paula; Lioma, Christina; Larsen, Birger; Jørgensen, Henrik L; Cox, Ingemar J; Hansen, Lars Kai; Ingwersen, Peter; Winther, Ole
The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface to this information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) diseases represent an especially challenging and thus interesting class to diagnose as each is rare, diverse in symptoms and usually has scattered resources associated with it. We design an evaluation approach for web search engines for rare disease diagnosis which includes 56 real life diagnostic cases, performance measures, information resources and guidelines for customising Google Search to this task. In addition, we introduce FindZebra, a specialized (vertical) rare disease search engine. FindZebra is powered by open source search technology and uses curated freely available online medical information. FindZebra outperforms Google Search in both default set-up and customised to the resources used by FindZebra. We extend FindZebra with specialized functionalities exploiting medical ontological information and UMLS medical concepts to demonstrate different ways of displaying the retrieved results to medical experts. Our results indicate that a specialized search engine can improve the diagnostic quality without compromising the ease of use of the currently widely popular standard web search. The proposed evaluation approach can be valuable for future development and benchmarking. The FindZebra search engine is available at http://www.findzebra.com/. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
D. Vandic (Damir)
markdownabstractOver the last few years, we have experienced an increase in online shopping. Consequently, there is a need for efficient and effective product search engines. The rapid growth of e-commerce, however, has also introduced some challenges. Studies show that users can get overwhelmed by
Tian, Xinmei; Tao, Dacheng; Hua, Xian-Sheng; Wu, Xiuqing
Image search reranking methods usually fail to capture the user's intention when the query term is ambiguous. Therefore, reranking with user interactions, or active reranking, is highly demanded to effectively improve the search performance. The essential problem in active reranking is how to target the user's intention. To complete this goal, this paper presents a structural information based sample selection strategy to reduce the user's labeling efforts. Furthermore, to localize the user's intention in the visual feature space, a novel local-global discriminative dimension reduction algorithm is proposed. In this algorithm, a submanifold is learned by transferring the local geometry and the discriminative information from the labelled images to the whole (global) image database. Experiments on both synthetic datasets and a real Web image search dataset demonstrate the effectiveness of the proposed active reranking scheme, including both the structural information based active sample selection strategy and the local-global discriminative dimension reduction algorithm.
Early this decade, the number of Web-based documents stored on the servers of the University of Florida hovered near 300,000. By the end of 2006, that number had leapt to four million. Two years later, the university hosts close to eight million Web documents. Web sites for colleges and universities everywhere have become repositories for data…
In this thesis, we present a prototype for search engine to show how such a semantic search application based on ontology techniques contributes to save time for user, and improve the quality of relevant search results compared to a traditional search engine. This system is built as a query improvement module, which uses ontology and sorts the results search based on four predefined categories. The first and important part of the implementation of search engine prototype is to apply ontology ...
Balog, Krisztian; Azzopardi, Leif; de Rijke, Maarten
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambiguation as a document clustering problem, where it is assumed that the documents represent particular people. This leads to the person cluster hypothesis, which states that similar documents tend to represent the same person. Single Pass Clustering, k-Means Clustering, Agglomerative Clustering and Probabilistic Latent Semantic Analysis are employed and empirically evaluated in this context. On the SemEval 2007 Web People Search it is shown that the person cluster hypothesis holds reasonably well and that the Single Pass Clustering and Agglomerative Clustering methods provide the best performance.
Edwards, Christine C; Elliott, Sean P; Conway, Terry L; Woodruff, Susan I
The objective of this study was to assess Web sites related to teen smoking cessation on the Internet. Seven Internet search engines were searched using the keywords teen quit smoking. The top 20 hits from each search engine were reviewed and categorized. The keywords teen quit smoking produced between 35 and 400,000 hits depending on the search engine. Of 140 potential hits, 62% were active, unique sites; 85% were listed by only one search engine; and 40% focused on cessation. Findings suggest that legitimate on-line smoking cessation help for teens is constrained by search engine choice and the amount of time teens spend looking through potential sites. Resource listings should be updated regularly. Smoking cessation Web sites need to be picked up on multiple search engine searches. Further evaluation of smoking cessation Web sites need to be conducted to identify the most effective help for teens.
Demeester, Thomas; Trieschnigg, Rudolf Berend; Zhou, Ke; Nguyen, Dong-Phuong; Hiemstra, Djoerd
This paper presents 'FedWeb Greatest Hits', a large new test collection for research in web information retrieval. As a combination and extension of the datasets used in the TREC Federated Web Search Track, this collection opens up new research possibilities on federated web search challenges, as
Full Text Available The purpose of this paper is to establish the textual analytics involved in developing a search engine for an ebook portal. We have extracted our dataset from Project Gutenberg using a robot harvester. Textual Analytics is used for efficient search retrieval. The entire dataset is represented using Vector Space Model where each document is a vector in the vector space. Further for computational purposes we represent our dataset in the form of a Term Frequency- Inverse Document Frequency tf-idf matrix. The first step involves obtaining the most coherent sequence of words of the search query entered. The entered query is processed using Front End algorithms this includes-Spell Checker Text Segmentation and Language Modeling. Back End processing includes Similarity Modeling Clustering Indexing and Retrieval. The relationship between documents and words is established using cosine similarity measured between the documents and words in Vector Space. Clustering performed is used to suggest books that are similar to the search query entered by the user. Lastly the Lucene Based Elasticsearch engine is used for indexing on the documents. This allows faster retrieval of data. Elasticsearch returns a dictionary and creates a tf-idf matrix. The processed query is compared with the dictionary obtained and tf-idf matrix is used to calculate the score for each match to give most relevant result.
Describes a survey that was conducted involving participants in the library instruction program at two Canadian universities in order to describe the characteristics of students receiving instruction in Web searching. Examines criteria for evaluating Web sites, search strategies, use of search engines, and frequency of use. Questionnaire is…
Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description?; First Steps in an Information Commerce Economy: Digital Rights Management in the Emerging E-Book Environment; Interoperability: Digital Rights Management and the Emerging EBook Environment; Searching the Deep Web: Direct Query Engine Applications at the Department of Energy.
Lagoze, Carl; Neylon, Eamonn; Mooney, Stephen; Warnick, Walter L.; Scott, R. L.; Spence, Karen J.; Johnson, Lorrie A.; Allen, Valerie S.; Lederman, Abe
Includes four articles that discuss Dublin Core metadata, digital rights management and electronic books, including interoperability; and directed query engines, a type of search engine designed to access resources on the deep Web that is being used at the Department of Energy. (LRW)
Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
A publicly available dataset for federated search reflecting a real web environment has long been bsent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web
Samwald, Matthias; Hanbury, Allan
The World Wide Web has become an important source of information for medical practitioners. To complement the capabilities of currently available web search engines we developed FindMeEvidence, an open-source, mobile-friendly medical search engine. In a preliminary evaluation, the quality of results from FindMeEvidence proved to be competitive with those from TRIP Database, an established, closed-source search engine for evidence-based medicine.
Social networks and collaborative tagging systems are rapidly gaining popularity as primary means for sorting and sharing data: users tag their bookmarks in order to simplify information dissemination and later lookup. Social Bookmarking services are useful in two important respects: first, they can allow an individual to remember the visited URLs, and second, tags can be made by the community to guide users towards valuable content. In this paper we focus on the latter use: we present a novel approach for personalized web search using query expansion. We further extend the family of well-known co-occurence matrix technique models by using a new way of exploring social tagging services. Our approach shows its strength particularly in the case of disambiguation of word contexts. We show how to design and implement such a system in practice and conduct several experiments. To the best of our knowledge this is the first study centered on using social bookmarking and tagging techniques for personalization of web search and its evaluation in a real-world scenario.
document type, and locate in a standard easy to read format. Some search engines are capable of searching. Boolean ... Several types of search engines have been designed and implemented based on different retrieval .... Q4.4 Samsung Galaxy S3. 5. Q1.5 Iphone. Q2.5 Dell Inspiron. Q3.5 Operating systems ebooks.
The typical behaviour of the Web search engine user is widely known: a user only types in one or a few keywords and expects the search engine to produce relevant results in an instant. Search engines not only adapt to this behaviour. On the contrary, they are often faced with criticism that they themselves created this kind of behaviour. As search engines are trendsetters for the whole information world, it is important to know how they cope with their users’ behaviour. Recent develo...
Explosive growth of the World Wide Web as well as its heterogeneity call for powerful and easy to use search tools capable to provide the user with a moderate number of relevant answers. This paper presents analysis of key aspects of recently developed Web search methods and tools: visual representation of subject trees, interactive user interfaces, linguistic approaches, image search, ranking and grouping of search results, database search, and scientific information retrieval. Current trend...
Vidmar, Dale J.
Discusses various search strategies and tools that can be used for searching on the Internet, including search engines and search directories; Boolean searching; metasearching; relevancy ranking; automatic phrase detection; backlinks; natural-language searching; clustering and cataloging information; image searching; customization and portals;…
Mijić, Jure; Moens, Marie-Francine; Dalbelo Bašić, Bojana
Semi-structured document retrieval is becoming more popular with the increasing quantity of data available in XML format. In this paper, we describe a search engine model that exploits the structure of the document and uses language modelling and smoothing at the document and collection levels for calculating the relevance of each element from all the documents in the collection to a user query. Element priors, CAS query constraint filtering, and the +/- operators are also used in the ranking procedure. We also present the results of our participation in the INEX 2008 Ad Hoc Track.
Esler, Sandra L.; Nelson, Michael L.
The current proliferation of on-line information resources underscores the requirement for the ability to index collections of information and search and retrieve them in a convenient manner. This study develops criteria for analytically comparing the index and search engines and presents results for a number of freely available search engines. A product of this research is a toolkit capable of automatically indexing, searching, and extracting performance statistics from each of the focused search engines. This toolkit is highly configurable and has the ability to run these benchmark tests against other engines as well. Results demonstrate that the tested search engines can be grouped into two levels. Level one engines are efficient on small to medium sized data collections, but show weaknesses when used for collections 100MB or larger. Level two search engines are recommended for data collections up to and beyond 100MB.
Borisov, A.; Serdyukov, P.; de Rijke, M.
In web search, latent semantic models have been proposed to bridge the lexical gap between queries and documents that is due to the fact that searchers and content creators often use different vocabularies and language styles to express the same concept. Modern search engines simply use the outputs
Zhang, Jie; Zhang, Haijiang; Chen, Enhong; Zheng, Yi; Kuang, Wenhuan; Zhang, Xiong
When an earthquake occurs, seismologists want to use recorded seismograms to infer its location, magnitude and source-focal mechanism as quickly as possible. If such information could be determined immediately, timely evacuations and emergency actions could be undertaken to mitigate earthquake damage. Current advanced methods can report the initial location and magnitude of an earthquake within a few seconds, but estimating the source-focal mechanism may require minutes to hours. Here we present an earthquake search engine, similar to a web search engine, that we developed by applying a computer fast search method to a large seismogram database to find waveforms that best fit the input data. Our method is several thousand times faster than an exact search. For an Mw 5.9 earthquake on 8 March 2012 in Xinjiang, China, the search engine can infer the earthquake's parameters in <1 s after receiving the long-period surface wave data.
Mijes Cruz, Mario Humberto; Soto Aldaco, Andrea; Maldonado Cano, Luis Alejandro; López Rodríguez, Mario; Rodríguez Vázqueza, Manuel Antonio; Amaya Reyes, Laura Mariel; Cano Martínez, Elizabeth; Pérez Rosas, Osvaldo Gerardo; Rodríguez Espejo, Luis; Flores Secundino, Jesús Abimelek; Rivera Martínez, José Luis; García Vázquez, Mireya Saraí; Zamudio Fuentes, Luis Miguel; Sánchez Valenzuela, Juan Carlos; Montoya Obeso, Abraham; Ramírez Acosta, Alejandro Álvaro
Current search engines are based upon search methods that involve the combination of words (text-based search); which has been efficient until now. However, the Internet's growing demand indicates that there's more diversity on it with each passing day. Text-based searches are becoming limited, as most of the information on the Internet can be found in different types of content denominated multimedia content (images, audio files, video files). Indeed, what needs to be improved in current search engines is: search content, and precision; as well as an accurate display of expected search results by the user. Any search can be more precise if it uses more text parameters, but it doesn't help improve the content or speed of the search itself. One solution is to improve them through the characterization of the content for the search in multimedia files. In this article, an analysis of the new generation multimedia search engines is presented, focusing the needs according to new technologies. Multimedia content has become a central part of the flow of information in our daily life. This reflects the necessity of having multimedia search engines, as well as knowing the real tasks that it must comply. Through this analysis, it is shown that there are not many search engines that can perform content searches. The area of research of multimedia search engines of new generation is a multidisciplinary area that's in constant growth, generating tools that satisfy the different needs of new generation systems.
Full Text Available People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin’s principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.
Utami, Dina; Bickmore, Timothy W; Barry, Barbara; Paasche-Orlow, Michael K
Several web-based search engines have been developed to assist individuals to find clinical trials for which they may be interested in volunteering. However, these search engines may be difficult for individuals with low health and computer literacy to navigate. The authors present findings from a usability evaluation of clinical trial search tools with 41 participants across the health and computer literacy spectrum. The study consisted of 3 parts: (a) a usability study of an existing web-based clinical trial search tool; (b) a usability study of a keyword-based clinical trial search tool; and (c) an exploratory study investigating users' information needs when deciding among 2 or more candidate clinical trials. From the first 2 studies, the authors found that users with low health literacy have difficulty forming queries using keywords and have significantly more difficulty using a standard web-based clinical trial search tool compared with users with adequate health literacy. From the third study, the authors identified the search factors most important to individuals searching for clinical trials and how these varied by health literacy level.
Mattmann, C. A.
Search has progressed through several stages due to the increasing size of the Web. Search engines first focused on text and its rate of occurrence; then focused on the notion of link analysis and citation then on interactivity and guided search; and now on the use of social media - who we interact with, what we comment on, and who we follow (and who follows us). The next stage, referred to as "deep search," requires solutions that can bring together text, images, video, importance, interactivity, and social media to solve this challenging problem. The Apache Nutch project provides an open framework for large-scale, targeted, vertical search with capabilities to support all past and potential future search engine foci. Nutch is a flexible infrastructure allowing open access to ranking; URL selection and filtering approaches, to the link graph generated from search, and Nutch has spawned entire sub communities including Apache Hadoop and Apache Tika. It addresses many current needs with the capability to support new technologies such as image and video. On the DARPA Memex project, we are creating create specific extensions to Nutch that will directly improve its overall technological superiority for search and that will directly allow us to address complex search problems including human trafficking. We are integrating state-of-the-art algorithms developed by Kitware for IARPA Aladdin combined with work by Harvard to provide image and video understanding support allowing automatic detection of people and things and massive deployment via Nutch. We are expanding Apache Tika for scene understanding, object/person detection and classification in images/video. We are delivering an interactive and visual interface for initiating Nutch crawls. The interface uses Python technologies to expose Nutch data and to provide a domain specific language for crawls. With the Bokeh visualization library the interface we are delivering simple interactive crawl visualization and
Ramachandran, R.; Movva, S.; Li, X.; Cherukuri, P.; Graves, S.
The goal for search engines is to return results that are both accurate and complete. The search engines should find only what you really want and find everything you really want. Search engines (even meta search engines) lack semantics. The basis for search is simply based on string matching between the user's query term and the resource database and the semantics associated with the search string is not captured. For example, if an atmospheric scientist is searching for "pressure" related web resources, most search engines return inaccurate results such as web resources related to blood pressure. In this presentation Noesis, which is a meta-search engine and a resource aggregator that uses domain ontologies to provide scoped search capabilities will be described. Noesis uses domain ontologies to help the user scope the search query to ensure that the search results are both accurate and complete. The domain ontologies guide the user to refine their search query and thereby reduce the user's burden of experimenting with different search strings. Semantics are captured by refining the query terms to cover synonyms, specializations, generalizations and related concepts. Noesis also serves as a resource aggregator. It categorizes the search results from different online resources such as education materials, publications, datasets, web search engines that might be of interest to the user.
Bordino, Ilaria; Battiston, Stefano; Caldarelli, Guido; Cristelli, Matthieu; Ukkonen, Antti; Weber, Ingmar
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Full Text Available We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.
Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc
The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.
Viswa S S
Nowadays, web scale image search engines (e.g. Google Image Search, Microsoft Live Image Search) rely almost purely on surrounding text features. Users type keywords in hope of finding a certain type of images. The search engine returns thousands of images ranked by the text keywords extracted from the surrounding text. However, many of returned images are noisy, disorganized, or irrelevant. Even Google and Microsoft have no Visual Information for searching of images. Using visual information...
Full Text Available Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
Borhaninejad, S.; Hakimpour, F.; Hamzei, E.
Nowadays the selective access to information on the Web is provided by search engines, but in the cases which the data includes spatial information the search task becomes more complex and search engines require special capabilities. The purpose of this study is to extract the information which lies in spatial documents. To that end, we implement and evaluate information extraction from GML documents and a retrieval method in an integrated approach. Our proposed system consists of three components: crawler, database and user interface. In crawler component, GML documents are discovered and their text is parsed for information extraction; storage. The database component is responsible for indexing of information which is collected by crawlers. Finally the user interface component provides the interaction between system and user. We have implemented this system as a pilot system on an Application Server as a simulation of Web. Our system as a spatial search engine provided searching capability throughout the GML documents and thus an important step to improve the efficiency of search engines has been taken.
Huang, Chang-Qin; Duan, Ru-Lin; Tang, Yong; Zhu, Zhi-Ting; Yan, Yong-Jian; Guo, Yu-Qing
The semantic web brings a new opportunity for efficient information organization and search. To meet the special requirements of the educational field, this paper proposes an intelligent search engine enabled by educational semantic support service, where three kinds of searches are integrated into Educational Information Intelligent Search (EIIS)…
Full Text Available Eﬀective management and sharing of geodata is one of the priorities of the European Union (INSPIRE activity and companies all around the world. Many diﬀerent companies and organisations publish their geodata using web mapping services. This situation leads to a multiple publishing of similar or completely same geodata. On the other hand, there is frequently a problem how to determine an appropriate mapserver with the required data. This paper presents a geodata search engine which solves the problem how to access geodata more eﬀectively. Presented solution aggregates data from the diﬀerent mapservers and provides an interface according to the Open Geospatial Consortium Web Map Server speciﬁcation. This allows to use our solution in the standard GIS tools as common mapserver. Completely new feature is a request which allows to select map layers which fulﬁlls speciﬁed criteria. Selection could be given by keywords in a map layer description and by deﬁning a bounding box on Earth surface. Response is a list of appropriate layers sorted according to their relevance. Presented solution could be among other applications signiﬁcant source of information for many data mining techniques. It allows to interconnect processed data with their space-temporal context.
Zaki, Nazar; Tennakoon, Chandana
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search
This article briefly outlines the current major search engines' approach to teachers' web searching. The aim of this article is to make Web searching easier for teachers when searching for relevant online teaching materials, in general, and UK teacher practitioners at primary, secondary and post-compulsory levels, in particular. Therefore, major…
McDonnell, Michael; Shiri, Ali
Purpose: The purpose of this paper is to introduce the notion of social search as a new concept, drawing upon the patterns of web search behaviour. It aims to: define social search; present a taxonomy of social search; and propose a user-centred social search method. Design/methodology/approach: A mixed method approach was adopted to investigate…
Goto, Shoji; Ozono, Tadachika; Shintani, Toramatsu
In this paper, we propose a new method for selecting search engines on WWW for selective meta-search engine. In selective meta-search engine, a method is needed that would enable selecting appropriate search engines for users' queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate search engines if a query contains polysemous words. In this paper, we describe an search engine selection method based on thesaurus. In our method, a thesaurus is constructed from documents in a search engine and is used as a source description of the search engine. The form of a particular thesaurus depends on the documents used for its construction. Our method enables search engine selection by considering relationship between terms and overcomes the problems caused by polysemous words. Further, our method does not have a centralized broker maintaining data, such as document frequency for all search engines. As a result, it is easy to add a new search engine, and meta-search engines become more scalable with our method compared to other existing methods.
Zhang, Yang; Gao, Hong
Human flesh search engine can be a double-edged sword, bringing convenience on the one hand and leading to infringement of personal privacy on the other hand. This paper discusses the ethical problems brought about by the human flesh search engine, as well as possible solutions.
Shu-Hsien L. Chen
Full Text Available The article discusses the searching behaviors of school children using the online catalog and the World Wide Web. The amount of information and search capability for the online catalog and the World Wide Web, though, differ to a great extent, students share several common problems in using them. They have problems in spelling and typing, phrasing of search terms, extracting key concept, formulating search strategy, and evaluating search results. Their specific problems of searching the World Wide Web include rapid navigation of the Internet, overuse of Back button and browsing strategy, and evaluating only the first screen. Teachers and media specialists need to address these problems in the instruction of information literacy skills so that students can fully utilize the power of online searching and become efficient information searchers.
Rodriguez-Vaamonde, Sergio; Torresani, Lorenzo; Fitzgibbon, Andrew W
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
The exponential growth of digital information available in companies and on the Web creates the need for search tools that can respond to the most sophisticated information needs. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web documents. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic.In the first part of the book, we propose a model for expert finding based on the well-consolidated vector space model for Information Retrieval and inv
Nasution, Mahyuddin K. M.
The best tool currently used to access information is a search engine. Meanwhile, the information space has its own behaviour. Systematically, an information space needs to be familiarized with mathematics so easily we identify the characteristics associated with it. This paper reveal some characteristics of search engine based on a model of document collection, which are then estimated the impact on the feasibility of information. We reveal some of characteristics of search engine on the lemma and theorem about singleton and doubleton, then computes statistically characteristic as simulating the possibility of using search engine. In this case, Google and Yahoo. There are differences in the behaviour of both search engines, although in theory based on the concept of documents collection.
Bracke, Paul J; Howse, David K; Keim, Samuel M
...) health sciences programs and to the University Medical Center. Librarians at AHSL collaborated with UA College of Medicine faculty to create an innovative search engine, Evidence-based Medicine (EBM...
Full Text Available Aim: Informal citations is bibliographic information (title or Internet address, citing sources of information resources for informal scholarly communication and always neglected in traditional citation databases. This study is done, in order to answer the question of whether informal citations in the web environment are traceable. The present research aims to determine what proportion of web citations of Google search engine is related to formal and informal citation. Research method: Webometrics is the method used. The study is done on 1344 research articles of 98 open access journal, and the method that is used to extract the web citation from Google search engine is “Web / URL citation extraction". Findings: The findings showed that ten percent of the web citations of Google search engine are formal and informal citations. The highest formal citation in the Google search engine with 19/27% is in the field of library and information science and the lowest official citation by 1/54% is devoted to the field of civil engineering. The highest percentage of informal citations with 3/57% is devoted to sociology and the lowest percentage of informal citations by 0/39% is devoted to the field of civil engineering. Journal Citation is highest with 94/12% in the surgical field and lowest with 5/26 percent in the philosophy filed. Result: Due to formal and informal citations in the Google search engine which is about 10 percent and the reduction of this amount compared to previous research, it seems that track citations by this engine should be treated with more caution. We see that the amount of formal citation is variable in different disciplines. Cited journals in the field of surgery, is highest and in the filed of philosophy is lowest, this indicates that in the filed of philosophy, that is a subset of the social sciences, journals in scientific communication do not play a significant role. On the other hand, book has a key role in this filed
Fagan, Jody Condit
Today's scholars face an outstanding array of choices when choosing search tools: Google Scholar, discipline-specific abstracts and index databases, library discovery tools, and more recently, Microsoft's re-launch of their academic search tool, now dubbed Microsoft Academic Search. What are these tools' strengths for the emerging needs of…
.... If so, is the activity a "fair use" protected by the Copyright Act? These issues frequently implicate search engines, which scan the web to allow users to find content for uses, both legitimate and illegitimate...
Nguyen, Tien N; Zhang, Jin
This paper presents an interactive visualization system, named WebSearchViz, for visualizing the Web search results and acilitating users' navigation and exploration. The metaphor in our model is the solar system with its planets and asteroids revolving around the sun. Location, color, movement, and spatial distance of objects in the visual space are used to represent the semantic relationships between a query and relevant Web pages. Especially, the movement of objects and their speeds add a new dimension to the visual space, illustrating the degree of relevance among a query and Web search results in the context of users' subjects of interest. By interacting with the visual space, users are able to observe the semantic relevance between a query and a resulting Web page with respect to their subjects of interest, context information, or concern. Users' subjects of interest can be dynamically changed, redefined, added, or deleted from the visual space.
Frydenberg, Mark; Miko, John S.
Search engine optimization (SEO), the promoting of a Web site so it achieves optimal position with a search engine's rankings, is an important strategy for organizations and individuals in order to promote their brands online. Techniques for achieving SEO are relevant to students of marketing, computing, media arts, and other disciplines, and many…
Li, Zuofeng; Liu, Xingnan; Wen, Jingran; Xu, Ye; Zhao, Xin; Li, Xuan; Liu, Lei; Zhang, Xiaoyan
With the completion of the human genome project and the development of new methods for gene variant detection, the integration of mutation data and its phenotypic consequences has become more important than ever. Among all available resources, locus-specific databases (LSDBs) curate one or more specific genes' mutation data along with high-quality phenotypes. Although some genotype-phenotype data from LSDB have been integrated into central databases little effort has been made to integrate all these data by a search engine approach. In this work, we have developed disease related unique gene mutation search engine (DRUMS), a search engine for human disease related unique gene mutation as a convenient tool for biologists or physicians to retrieve gene variant and related phenotype information. Gene variant and phenotype information were stored in a gene-centred relational database. Moreover, the relationships between mutations and diseases were indexed by the uniform resource identifier from LSDB, or another central database. By querying DRUMS, users can access the most popular mutation databases under one interface. DRUMS could be treated as a domain specific search engine. By using web crawling, indexing, and searching technologies, it provides a competitively efficient interface for searching and retrieving mutation data and their relationships to diseases. The present system is freely accessible at http://www.scbit.org/glif/new/drums/index.html. © 2011 Wiley-Liss, Inc.
Li, Jieyu; Huang, Chunlan; Wang, Xiuhong; Wu, Shengli
Introduction: In the big data age, we have to deal with a tremendous amount of information, which can be collected from various types of sources. For information search systems such as Web search engines or online digital libraries, the collection of documents becomes larger and larger. For some queries, an information search system needs to…
The technology behind the meta search engines supports countless number of Internet services ranging from the price and quality comparison websites to more sophisticated traffic connection finders and general search engines like Google. Meta search engines generally increase market transparency,
National Aeronautics and Space Administration — Use this search engine to generate custom tables of orbital and/or physical parameters for all asteroids and comets (or a specified sub-set) in our small-body...
A software program that searches and browses mission data emulates a Web browser, containing standard meta - phors for Web browsing. By taking advantage of back-end URLs, users may save and share search states. Also, since a Web interface is familiar to users, training time is reduced. Familiar back and forward buttons move through a local search history. A refresh/reload button regenerates a query, and loads in any new data. URLs can be constructed to save search results. Adding context to the current search is also handled through a familiar Web metaphor. The query is constructed by clicking on hyperlinks that represent new components to the search query. The selection of a link appears to the user as a page change; the choice of links changes to represent the updated search and the results are filtered by the new criteria. Selecting a navigation link changes the current query and also the URL that is associated with it. The back button can be used to return to the previous search state. This software is part of the MSLICE release, which was written in Java. It will run on any current Windows, Macintosh, or Linux system.
Bron, M.; Balog, K.; de Rijke, M.
The scale of today's Web of Data motivates the use of keyword search-based approaches to entity-oriented search tasks in addition to traditional structure-based approaches, which require users to have knowledge of the underlying schema. We propose an alternative structure-based approach that makes
Full Text Available Seeking information on the Internet is becoming a necessity 'at school, at work and in every social sphere. Unfortunately the difficulties' inherent in the use of search engines and the use of unconscious cognitive approaches inefficient limit their effectiveness. It is in this respect presented a method, called SEWCOM that lets you create conceptual maps through interaction with search engines.
Many proteomics laboratories have found spectral counting to be an ideal way to recognize biomarkers that differentiate cohorts of samples. This approach assumes that proteins that differ in quantity between samples will generate different numbers of identifiable tandem mass spectra. Increasingly, researchers are employing multiple search engines to maximize the identifications generated from data collections. This talk evaluates four strategies to combine information from multiple search engines in comparative proteomics. The “Count Sum” model pools the spectra across search engines. The “Vote Counting” model combines the judgments from each search engine by protein. Two other models employ parametric and non-parametric analyses of protein-specific p-values from different search engines. We evaluated the four strategies in two different data sets. The ABRF iPRG 2009 study generated five LC-MS/MS analyses of “red” E. coli and five analyses of “yellow” E. coli. NCI CPTAC Study 6 generated five concentrations of Sigma UPS1 spiked into a yeast background. All data were identified with X!Tandem, Sequest, MyriMatch, and TagRecon. For both sample types, “Vote Counting” appeared to manage the diverse identification sets most effectively, yielding heightened discrimination as more search engines were added.
Chen, Shu; McGuinness, Kevin; Aly, Robin; de Jong, Franciska M.G.; O' Connor, Noel E.
The aim of AXES is to develop tools that provide various types of users with new engaging ways to interact with audiovisual libraries, helping them discover, browse, navigate, search, and enrich archives. This paper describes the initial (lite) version of the AXES search engine, which is targeted at
Corrao, Salvatore; Leone, Francesco; Arnone, Sabrina
The internet is a communication medium and content distributor that provide information in the general sense but it could be of great utility regarding as the search and retrieval of biomedical information. Search engines represent a great deal to rapidly find information on the net. However, we do not know whether general search engines and meta-search ones are reliable in order to find useful and validated biomedical information. The aim of our study was to verify the reproducibility of a search by key-words (pediatric or evidence) using 9 international search engines and 1 meta-search engine at the baseline and after a one year period. We analysed the first 20 citations as output of each searching. We evaluated the formal quality of Web-sites and their domain extensions. Moreover, we compared the output of each search at the start of this study and after a one year period and we considered as a criterion of reliability the number of Web-sites cited again. We found some interesting results that are reported throughout the text. Our findings point out an extreme dynamicity of the information on the Web and, for this reason, we advice a great caution when someone want to use search and meta-search engines as a tool for searching and retrieve reliable biomedical information. On the other hand, some search and meta-search engines could be very useful as a first step searching for defining better a search and, moreover, for finding institutional Web-sites too. This paper allows to know a more conscious approach to the internet biomedical information universe.
Full Text Available In the age of writable web, new skills and new practices are appearing. In an environment that allows everyone to communicate information globally, internet referencing (or SEO is a strategic discipline that aims to generate visibility, internet traffic and a maximum exploitation of sites publications. Often misperceived as a fraud, SEO has evolved to be a facilitating tool for anyone who wishes to reference their website with search engines. In this article we show that it is possible to achieve the first rank in search results of keywords that are very competitive. We show methods that are quick, sustainable and legal; while applying the principles of active SEO 2.0. This article also clarifies some working functions of search engines, some advanced referencing techniques (that are completely ethical and legal and we lay the foundations for an in depth reflection on the qualities and advantages of these techniques.
Full Text Available In the age of writable web, new skills and new practices are appearing. In an environment that allows everyone to communicate information globally, internet referencing (or SEO is a strategic discipline that aims to generate visibility, internet traffic and a maximum exploitation of sites publications. Often misperceived as a fraud, SEO has evolved to be a facilitating tool for anyone who wishes to reference their website with search engines. In this article we show that it is possible to achieve the first rank in search results of keywords that are very competitive. We show methods that are quick, sustainable and legal; while applying the principles of active SEO 2.0. This article also clarifies some working functions of search engines, some advanced referencing techniques (that are completely ethical and legal and we lay the foundations for an in depth reflection on the qualities and advantages of these techniques.
Argenton, C.; Prüfer, J.
The market for Internet search is not only economically and socially important, it is also highly concentrated. Is this a problem? We study the question of whether “competition is only a free click away.” We argue that the market for Internet search is characterized by indirect network externalities and construct a simple model of search engine competition, which produces a market share development that fits well the empirically observed developments since 2003. We find that there is a strong...
Full Text Available Background: The potential of the World Wide Web (‘the Web’ as a tool for information retrieval in higher education is beyond question. Harnessing this potential, however, remains a challenge, particularly in the context of developing countries, where students are drawn from diverse socio-economic, educational and technological backgrounds. Objectives: The purpose of this study is to identify the Web search tactics used by postgraduate students in order to address the weaknesses of undergraduate students with regard to their Web searching tactics. This article forms part of a wider study into postgraduate students’ information retrieval strategies at the University of KwaZulu-Natal, Pietermaritzburg campus, South Africa. Method: The study utilised the mixed methods approach, employing both questionnaires (Phase 1 and structured interviews (Phase 2, and was largely underpinned by Bates’s model of information search tactics. This article reports and reflects on the findings of Phase 1, which focused on identifying the Web search tactics employed by postgraduate students. Results: Findings indicated a preference for lower-level Web search tactics, despite respondents largely self-reporting as intermediate or expert users. Moreover, the majority of respondents gained their knowledge on Web searching through experience and only a quarter of respondents have been given formal training on Web searching. Conclusion: In addition to contributing to theory, it is envisaged that this article will contribute to practice by informing the design of undergraduate training interventions to proactively address the information retrieval challenges faced by novice users. Subsequent papers will report on Phase 2 of the study.
Dragusin, Radu; Petcu, Paula; Lioma, Christina; Larsen, Birger; Jørgensen, Henrik L; Cox, Ingemar J; Hansen, Lars Kai; Ingwersen, Peter; Winther, Ole
In our recent paper, we study web search as an aid in the process of diagnosing rare diseases. To answer the question of how well Google Search and PubMed perform, we created an evaluation framework with 56 diagnostic cases and made our own specialized search engine, FindZebra (findzebra.com). FindZebra uses a set of publicly available curated sources on rare diseases and an open-source information retrieval system, Indri. Our evaluation and the feedback received after the publication of our paper both show that FindZebra outperforms Google Search and PubMed. In this paper, we summarize the original findings and the response to FindZebra, discuss why Google Search is not designed for specialized tasks and outline some of the current trends in using web resources and social media for medical diagnosis.
Do Pazo-Oubiña, F; Calvo Pita, C; Puigventós Latorre, F; Periañez-Párraga, L; Ventayol Bosch, P
To identify publishers of pharmacotherapeutic information not found in biomedical journals that focuses on evaluating and providing advice on medicines and to develop a search engine to access this information. Compiling web sites that publish information on the rational use of medicines and have no commercial interests. Free-access web sites in Spanish, Galician, Catalan or English. Designing a search engine using the Google "custom search" application. Overall 159 internet addresses were compiled and were classified into 9 labels. We were able to recover the information from the selected sources using a search engine, which is called "AlquimiA" and available from http://www.elcomprimido.com/FARHSD/AlquimiA.htm. The main sources of pharmacotherapeutic information not published in biomedical journals were identified. The search engine is a useful tool for searching and accessing "grey literature" on the internet. Copyright © 2010 SEFH. Published by Elsevier Espana. All rights reserved.
Larsen, Jakob Eg; Halling, Søren Christian; Sigurdsson, Magnus Kristinn
We describe MuZeeker, a search engine with domain knowledge based on Wikipedia. MuZeeker enables the user to refine a search in multiple steps by means of category selection. In the present version we focus on multimedia search related to music and we present two prototype search applications (web......-based and mobile) and discuss the issues involved in adapting the search engine for mobile phones. A category based filtering approach enables the user to refine a search through relevance feedback by category selection instead of typing additional text, which is hypothesized to be an advantage in the mobile Mu......Zeeker application. We report from two usability experiments using the think aloud protocol, in which N=20 participants performed tasks using MuZeeker and a customized Google search engine. In both experiments web-based and mobile user interfaces were used. The experiment shows that participants are capable...
Nguyen, Hung D.; Steele, Gynelle C.
This report demonstrates the full-text-based search engine that works on any Web-based mobile application. The engine has the capability to search databases across multiple categories based on a user's queries and identify the most relevant or similar. The search results presented here were found using an Android (Google Co.) mobile device; however, it is also compatible with other mobile phones.
University (ECNUCS)  East China Normal University introduces the Search En- gine Impact Factor (SEIF), a query-independent measure of a search...Search Engine Impact Factor . In addition, three of these runs (ecomsvz, ecomsv and ecomsvt) make use of external resources (Google Search, data from... Impact Factor (see ECNUCS’ vertical selection submissions) has the biggest contribution to performance improvements, besides the vertical selection
Romaniuk, Ryszard S.
XLth Wilga Summer 2017 Symposium on Photonics Applications and Web Engineering was held on 28 May-4 June 2017. The Symposium gathered over 350 participants, mainly young researchers active in optics, optoelectronics, photonics, modern optics, mechatronics, applied physics, electronics technologies and applications. There were presented around 300 oral and poster papers in a few main topical tracks, which are traditional for Wilga, including: bio-photonics, optical sensory networks, photonics-electronics-mechatronics co-design and integration, large functional system design and maintenance, Internet of Things, measurement systems for astronomy, high energy physics experiments, and other. The paper is a traditional introduction to the 2017 WILGA Summer Symposium Proceedings, and digests some of the Symposium chosen key presentations. This year Symposium was divided to the following topical sessions/conferences: Optics, Optoelectronics and Photonics, Computational and Artificial Intelligence, Biomedical Applications, Astronomical and High Energy Physics Experiments Applications, Material Research and Engineering, and Advanced Photonics and Electronics Applications in Research and Industry.
Lazonder, Adrianus W.
This study compared Pairs of students with Single students in web search tasks. The underlying hypothesis was that peer-to-peer collaboration encourages students to articulate their thoughts, which in turn has a facilitative effect on the regulation of the search process as well as search outcomes.
Glisson, W.; Welland, R.; Glisson, L.M.
The integration of security and forensics into Web Engineering curricula is imperative! Poor security in web-based applications is continuing to cost organizations millions and the losses are still increasing annually. Security is frequently taught as a stand-alone course, assuming that security can be 'bolted on' to a web application at some point. Security issues must be integrated into Web Engineering processes right from the beginning to create secure solutions and therefore security shou...
Ahmed Mudassar Ali; Ramakrishnan, M.
Aim of this study is to form a cluster of search results based on similarity and to assign meaningful label to it Database driven web pages play a vital role in multiple domains like online shopping, e-education systems, cloud computing and other. Such databases are accessible through HTML forms and user interfaces. They return the result pages come from the underlying databases as per the nature of the user query. Such types of databases are termed as Web Databases (WDB). Web databases have ...
Woodruff, Allison; Rosenholtz, Ruth; Morrison, Julie B.; Faulring, Andrew; Pirolli, Peter
Discussion of Web search strategies focuses on a comparative study of textual and graphical summarization mechanisms applied to search engine results. Suggests that thumbnail images (graphical summaries) can increase efficiency in processing results, and that enhanced thumbnails (augmented with readable textual elements) had more consistent…
Belden, J.; Williams, J.; Richardson, B.; Schuster, K.
Summary Background Federated medical search engines are health information systems that provide a single access point to different types of information. Their efficiency as clinical decision support tools has been demonstrated through numerous evaluations. Despite their rigor, very few of these studies report holistic evaluations of medical search engines and even fewer base their evaluations on existing evaluation frameworks. Objectives To evaluate a federated medical search engine, MedSocket, for its potential net benefits in an established clinical setting. Methods This study applied the Human, Organization, and Technology (HOT-fit) evaluation framework in order to evaluate MedSocket. The hierarchical structure of the HOT-factors allowed for identification of a combination of efficiency metrics. Human fit was evaluated through user satisfaction and patterns of system use; technology fit was evaluated through the measurements of time-on-task and the accuracy of the found answers; and organization fit was evaluated from the perspective of system fit to the existing organizational structure. Results Evaluations produced mixed results and suggested several opportunities for system improvement. On average, participants were satisfied with MedSocket searches and confident in the accuracy of retrieved answers. However, MedSocket did not meet participants’ expectations in terms of download speed, access to information, and relevance of the search results. These mixed results made it necessary to conclude that in the case of MedSocket, technology fit had a significant influence on the human and organization fit. Hence, improving technological capabilities of the system is critical before its net benefits can become noticeable. Conclusions The HOT-fit evaluation framework was instrumental in tailoring the methodology for conducting a comprehensive evaluation of the search engine. Such multidimensional evaluation of the search engine resulted in recommendations for
Gwyn, Stephen D. J.; Hill, Norman; Kavelaars, Jj
While regular astronomical image archive searches can find images at a fixed location, they cannot find images of moving targets such as asteroids or comets. The Solar System Object Image Search (SSOIS) at the Canadian Astronomy Data Centre allows users to search for images of moving objects, allowing precoveries. SSOIS accepts as input either an object designation, a list of observations, a set of orbital elements, or a user-generated ephemeris for an object. It then searches for observations of that object over a range of dates. The user is then presented with a list of images containing that object from a variety of archives. Initially created to search the CFHT MegaCam archive, SSOIS has been extended to other telescopes including Gemini, Subaru/SuprimeCam, WISE, HST, the SDSS, AAT, the ING telescopes, the ESO telescopes, and the NOAO telescopes (KPNO/CTIO/WIYN), for a total of 24.5 million images. As the Pan-STARRS and Hyper Suprime-Cam archives become available, they will be incorporated as well. The SSOIS tool is located on the web at http://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/ssois/.
This thesis covers the design, implementation, and evaluation of a search engine which can give each user a customized index based on the documents they are authorized to view. A common solution available today for this situation is to filter the results of a query based on the list of documents a user has access to. In this scenario, it is possible for information to leak from the search engine because the filtering takes place after the results are ranked. Ranking algorithms are usually ...
Yang, Lei; Mei, Qiaozhu; Zheng, Kai; Hanauer, David A.
We analyzed a longitudinal collection of query logs of a full-text search engine designed to facilitate information retrieval in electronic health records (EHR). The collection, 202,905 queries and 35,928 user sessions recorded over a course of 4 years, represents the information-seeking behavior of 533 medical professionals, including frontline practitioners, coding personnel, patient safety officers, and biomedical researchers for patient data stored in EHR systems. In this paper, we present descriptive statistics of the queries, a categorization of information needs manifested through the queries, as well as temporal patterns of the users’ information-seeking behavior. The results suggest that information needs in medical domain are substantially more sophisticated than those that general-purpose web search engines need to accommodate. Therefore, we envision there exists a significant challenge, along with significant opportunities, to provide intelligent query recommendations to facilitate information retrieval in EHR. PMID:22195150
Dr. K.L. Shunmuganathan; Kalaivani, P.
Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Information extraction (IE) from semistructured Web documents plays an important role for a variety of information agents. In this paper, a framework of WebIE system with the help of the JADE platform is proposed to solve problems by non-visual automatic wrapper to extract data records from search engine results pa...
Rony Baskoro Lukito
Full Text Available The purpose of this research is how to optimize a web design that can increase the number of visitors. The number of Internet users in the world continues to grow in line with advances in information technology. Products and services marketing media do not just use the printed and electronic media. Moreover, the cost of using the Internet as a medium of marketing is relatively inexpensive when compared to the use of television as a marketing medium. The penetration of the internet as a marketing medium lasted for 24 hours in different parts of the world. But to make an internet site into a site that is visited by many internet users, the site is not only good from the outside view only. Web sites that serve as a medium for marketing must be built with the correct rules, so that the Web site be optimal marketing media. One of the good rules in building the internet site as a marketing medium is how the content of such web sites indexed well in search engines like google. Search engine optimization in the index will be focused on the search engine Google for 83% of internet users across the world using Google as a search engine. Search engine optimization commonly known as SEO (Search Engine Optimization is an important rule that the internet site is easier to find a user with the desired keywords.
Full Text Available In this era of Google search, it is easy for beginners to fancy literature review limited to this popular search engine. Unfortunately, this will miss a vast index of articles which exist within our reach. Some specialized search portals leading to their corresponding databases deal with the tremendous medical literature that has been generated over decades. This article deals with the “what?” and “how to?” of these available databases of the medical articles. This will make your literature review efficient and the confidence in collected evidence - accurate.
van den Bosch, Antal; Bogers, Toine; de Kunder, Maurice
One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine's index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing's indices over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much, if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.
Nguyen, Dong-Phuong; Demeester, Thomas; Trieschnigg, Rudolf Berend; Hiemstra, Djoerd
Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for
With the Internet-evoked paradigm shift in the academy, there has been a growing interest in students' Web-based information-seeking and source-use practices. Nevertheless, little is known as to how individual students go about searching for sources online and selecting source material for writing particular assignments. This exploratory study…
Skov, Mette; Ingwersen, Peter
There is a current trend to make museum collections widely accessible by digitising cultural heritage collections for the Internet. The present study takes a user perspective and explores the characteristics of online museum visitors' web search behaviour. A combination of quantitative and qualit...
Whitley, Katherine M.
With "Chemical Abstracts" and "Science Citation Index" both now available for citation searching, this study compares the duplication and uniqueness of citing references for works of chemistry researchers for the years 1999-2001. The two indexes cover very similar source material. This analysis of SciFinder Scholar and Web of…
Zhang, J.; Zhang, X.
For processing land seismic data, the near-surface problem is often very complex and may severely affect our capability to image the subsurface. The current state-of-the-art technology for near surface imaging is the early arrival waveform inversion that solves an acoustic wave-equation problem. However, fitting land seismic data with acoustic wavefield is sometimes invalid. On the other hand, performing elastic waveform inversion is very time-consuming. Similar to a web search engine, we develop a full elastic waveform search engine that includes a large database with synthetic elastic waveforms accounting for a wide range of interval velocity models in the CMP domain. With each CMP gather of real data as an entry, the search engine applies Multiple-Randomized K-Dimensional (MRKD) tree method to find approximate best matches to the entry in about a second. Interpolation of the velocity models at CMP positions creates 2D or 3D Vp, Vs, and density models for the near surface area. The method does not just return one solution; it gives a series of best matches in a solution space. Therefore, the results can help us to examine the resolution and nonuniqueness of the final solution. Further, this full waveform search method can avoid the issues of initial model and cycle skipping that the method of full waveform inversion is difficult to deal with.
Yin, Chengjiu; Hirokawa, Sachio; Yau, Jane Yin-Kim; Hashimoto, Kiyota; Tabata, Yoshiyuki; Nakatoh, Tetsuya
To help researchers in building a knowledge foundation of their research fields which could be a time-consuming process, the authors have developed a Cross Tabulation Search Engine (CTSE). Its purpose is to assist researchers in 1) conducting research surveys, 2) efficiently and effectively retrieving information (such as important researchers,…
Zhou, Yingyao; Zhou, Bin; Jiang, Shumei; King, Frederick J
As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from the inclusion of "false negative" and "false positive" results, and cannot accommodate diverse repertoire of formats currently available for chemical documents. To address these concerns, we developed an approach called Entity-Canonical Keyword Indexing (ECKI), which converts a chemical entity embedded in a data source into its canonical keyword representation prior to being indexed by text search engines. We implemented ECKI using Microsoft Office SharePoint Server Search, and the resultant hybrid search engine not only supported complex mixed chemical and keyword queries but also was applied to both intranet and Internet environments. We envision that the adoption of ECKI will empower researchers to pose more complex search questions that were not readily attainable previously and to obtain answers at much improved speed and accuracy.
Yoones A. Sekhavat
Full Text Available In recent years, the Internet has become embedded into the purchasing decision of consumers. The purpose of this paper is to study whether the Internet behavior of users correlates with their actual behavior in computer games market. Rather than proposing the most accurate model for computer game sales, we aim to investigate to what extent web search query data can be exploited to nowcast (contraction of “now” and “forecasting” referring to techniques used to make short-term forecasts (predict the present status of the ranking of mobile games in the world. Google search query data is used for this purpose, since this data can provide a real-time view on the topics of interest. Various statistical techniques are used to show the effectiveness of using web search query data to nowcast mobile games ranking.
Hellsten, I; Leydesdorff, L.; Wouters, P.
Internet search engines function in a present which changes continuously. The search engines update their indices regularly, overwriting webpages with newer ones, adding new pages to the index and losing older ones. Some search engines can be used to search for information on the internet for
Alshare, Khaled; Miller, Don; Wenger, James
This study explored college students' perceptions regarding their use of search engines. The main objective was to determine how frequently students used various search engines, whether advanced search features were used, and how many search engines were used. Various factors that might influence student responses were examined. Results showed…
Yea, Sang-Jun; Jang, Yunji; Seong, BoSeok; Kim, Chul
The information and knowledge about ethno-medicinal herbs are getting stronger interest in Global and Korea after the agreement of the Nagoya Protocol. However, it is known that there is a serious asymmetry of ethno-medicinal information between experts and public, thus this study aimed to analyze the similarities and differences in interest between experts and public for medicinal herbs in Korea through big data analysis. The medicinal herbs selected in this study were the top 10 herbs in terms of the amounts purchased by TKM centers. And two representative web search engines were selected to collect the web search logs, i.e. big data, of experts and public for medicinal herbs in Korea. Comparative analysis was accomplished through descriptive statistical analysis, Pearson's correlation coefficient, and time-series graph analysis. The web search traffic logs were collected for the past three years (2012-2014) from OASIS and NAVER, which are the representative web search engines of experts and public respectively in Korea. First, regarding OASIS, the most searched medicinal herb was Angelicae Gigantis Radix while the least searched was Alismatis Rhizoma; for NAVER, the most searched medicinal herb was Paeoniae Radix, unlike OASIS, and the least searched was Alismatis Rhizoma, as with OASIS. The coefficient between rank of herbs and OASIS was -0.401, and that between rank of herbs and NAVER was -0.387, and the correlational coefficient for web search trends of OASIS and NAVER during the past three years was 0.438. Also the correlation of interest between experts and public for each herb on monthly web trends basis was similar with regard to Glycyrrhizae Radix et Rhizoma and Angelicae Gigantis Radix, but different with regard to the other 8 medicinal herbs. Finally, significant outcomes or suggestions were figured out through time-series graph analysis. This study presents meaningful results concerning the similarities and differences in interest between experts and
Dolog, Peter; Thomsen, Lone Leth; Thomsen, Bent
as on general competence based web engineering proles oered also for those who specialize in other areas of software engineering. We describe the current state of the art and our experience with a web engineering curriculum within the software engineering masters degree programme. We also discuss an evolution...
Bendig, Regina B.
The author sought to determine to what extent the two search engines, Scirus and BASE (Bielefeld Academic Search Engines), would be useful to first-year university students as the first point of searching for chemical information. Five topics were searched and the first ten records of each search result were evaluated with regard to the type of…
Raj, S.; Sharma, V. L.; Singh, A. J.; Goel, S.
Background. The available health information on websites should be reliable and accurate in order to make informed decisions by community. This study was done to assess the quality and readability of health information websites on World Wide Web in India. Methods. This cross-sectional study was carried out in June 2014. The key words ?Health? and ?Information? were used on search engines ?Google? and ?Yahoo.? Out of 50 websites (25 from each search engines), after exclusion, 32 websites were ...
Thylstrup, Nanna; Teilmann, Stina
and strategic terms; and a cultural question of how human-computer interaction design works with navigational uncertainty, both as an experience to be managed and a resource to be exploited. This paper considers two copyright infringement cases that involved search engines as defendants, Kelly v. Arriba Soft......This article argues that thumbnail images are infrastructural images that raise issues of uncertainty in two distinct, but interrelated, areas: a legal question of how to define, understand and govern visual information infrastructures, in particular image search systems in epistemological...
Ciocca, Gianluigi; Schettini, Raimondo
We present here a web-based protytpe for the interactive search of items in quality electronic catalogues. The system based on a multimedia information retrieval architecture, allows the user to query a multimedia database according to several retrieval strategies, and progressively refine the system's response by indicating the relevance, or non-relevance of the items retrieved. Once a subset of images meeting the user's information needs have been identified, these images can be displayed in a virtual exhibition that can be visited interactively by the user exploiting VRML technology.
Chen, Nengcheng; E, Dongcheng; Di, Liping; Gong, Jianya; Chen, Zeqiang
In order to improve the access precision of polar geospatial information service on web, a new methodology for retrieving global spatial information services based on geospatial service search and ontology reasoning is proposed, the geospatial service search is implemented to find the coarse service from web, the ontology reasoning is designed to find the refined service from the coarse service. The proposed framework includes standardized distributed geospatial web services, a geospatial service search engine, an extended UDDI registry, and a multi-protocol geospatial information service client. Some key technologies addressed include service discovery based on search engine and service ontology modeling and reasoning in the Antarctic geospatial context. Finally, an Antarctica multi protocol OWS portal prototype based on the proposed methodology is introduced.
Lange, Matthias; Spies, Karl; Bargsten, Joachim; Haberhauer, Gregor; Klapperstück, Matthias; Leps, Michael; Weinel, Christian; Wünschiers, Röbbe; Weissbach, Mandy; Stein, Jens; Scholz, Uwe
Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
Full Text Available With the Internet 2.0 era, managing user emotions is a problem that more and more actors are interested in. Historically, the first notions of emotion sharing were expressed and defined with emoticons. They allowed users to show their emotional status to others in an impersonal and emotionless digital world. Now, in the Internet of social media, every day users share lots of content with each other on Facebook, Twitter, Google+ and so on. Several new popular web sites like FlickR, Picassa, Pinterest, Instagram or DeviantArt are now specifically based on sharing image content as well as personal emotional status. This kind of information is economically very valuable as it can for instance help commercial companies sell more efficiently. In fact, with this king of emotional information, business can made where companies will better target their customers needs, and/or even sell them more products. Research has been and is still interested in the mining of emotional information from user data since then. In this paper, we focus on the impact of emotions from images that have been collected from search image engines. More specifically our proposition is the creation of a filtering layer applied on the results of such image search engines. Our peculiarity relies in the fact that it is the first attempt from our knowledge to filter image search engines results with an emotional filtering approach.
Lütjohann, Dominic S; Shah, Asmi H; Christen, Michael P; Richter, Florian; Knese, Karsten; Liebel, Urban
Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, 'Sciencenet', which facilitates rapid searching over this large data space. By 'bringing the search engine to the data', we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the 'AskMe' experiment publisher is written in Python 2.7, and the backend 'YaCy' search engine is based on Java 1.6.
Full Text Available CWM Global Search is a meta-search engine allowing chemists and biologists to search the major chemical and biological databases on the Internet, by structure, synonyms, CAS Registry Numbers and free text. A meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source . CWM Global Search is a web application that has many of the characteristics of desktop applications (also known as Rich Internet Application, RIA, and it runs on both Windows and Macintosh platforms. The application is one of the first RIA for scientists. The application can be started using the URL http://cwmglobalsearch.com/gsweb.
Yumusak, Semih; Dogdu, Erdogan; KODAZ, Halife; Kamilaris, Andreas
In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this ...
Development of learning material that is distributed through and accessible via the World Wide Web. Various options from web technology are exploited to improve the quality and efficiency of learning material.
Brand-Gruwel, Saskia; Kammerer, Yvonne; Van Meeuwen, Ludo; Van Gog, Tamara
Nowadays, almost everyone uses the World Wide Web (WWW) to search for information of any kind. In education, students frequently use the WWW for selecting information to accomplish assignments such as writing an essay or preparing a presentation. The evaluation of sources and information is an important sub-skill in this process. But many students have not yet optimally developed this skill. On the basis of verbal reports, eye-tracking data, and navigation logs this study investigated how nov...
Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong
Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.
Svenstrup, Dan Tito; Jørgensen, Henrik L; Winther, Ole
% and 64%, respectively. Thus, FindZebra has a significantly (p search engines. When tested under the same conditions, Watson and FindZebra showed similar recall@10 accuracy. However, the tests were performed on different subsets of Doctors dilemma questions. Advances...... on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description...... in technology and access to high quality data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise....
Zhu, Jianhan; Song, Dawei; Eisenstadt, Marc; Barladeanu, Cristi; Rüger, Stefan
The effect of metadata in collection fusion has not been sufficiently studied. In response to this, we present a novel meta-search engine called Dyniqx for metadata based cross search. Dyniqx exploits the availability of metadata in academic search services such as PubMed and Google Scholar etc for fusing search results from heterogeneous search engines. In addition, metadata from these search engines are used for generating dynamic query controls such as sliders and tick boxes etc which are ...
Xuan, Weijian; Wang, Pinglang; Watson, Stanley J; Meng, Fan
Genome-wide high density SNP association studies are expected to identify various SNP alleles associated with different complex disorders. Understanding the biological significance of these SNP alleles in the context of existing literature is a major challenge since existing search engines are not designed to search literature for SNPs or other genetic markers. The literature mining of gene and protein functions has received significant attention and effort while similar work on genetic markers and their related diseases is still in its infancy. Our goal is to develop a web-based tool that facilitates the mining of Medline literature related to genetic studies and gene/protein function studies. Our solution consists of four main function modules for (1) identification of different types of genetic markers or genetic variations in Medline records (2) distinguishing positive versus negative linkage or association between genetic markers and diseases (3) integrating marker genomic location data from different databases to enable the retrieval of Medline records related to markers in the same linkage disequilibrium region (4) and a web interface called MarkerInfoFinder to search, display, sort and download Medline citation results. Tests using published data suggest MarkerInfoFinder can significantly increase the efficiency of finding genetic disorders and their underlying molecular mechanisms. The functions we developed will also be used to build a knowledge base for genetic markers and diseases. The MarkerInfoFinder is publicly available at: http://brainarray.mbni.med.umich.edu/brainarray/datamining/MarkerInfoFinder.
Qiu, Honglei; Huang, Jiacheng
Abstract Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL: http://ws.nju.edu.cn/biosearch/ PMID:29220451
The paper is a digest of work presented during a cyclic Ph.D. student symposium on Photonics and Web Engineering WILGA 2009. The subject of WILGA are Photonics Applications in Astronomy, Communications, Industry and High-Energy Physics Experiments. WILGA is sponsored by EuCARD Project. Symposium is organized by ISE PW in cooperation with professional organizations IEEE, SPIE, PSP and KEiT PAN. There are presented mainly Ph.D. and M.Sc. theses as well as achievements of young researchers. These papers, presented in such a big number, more than 250 in some years, are in certain sense a good digest of the condition of academic research capabilities in this branch of science and technology. The undertaken research subjects for Ph.D. theses in electronics is determined by the interest and research capacity (financial, laboratory and intellectual) of the young researchers and their tutors. Basically, the condition of academic electronics research depends on financing coming from applications areas. During Wilga 200...
Full Text Available A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001. Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3, 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The
Full Text Available The huge amount of data on the Internet and the diverse list of strategies used to try to link this information with relevant searches through Linked Data have generated a revolution in data treatment and its representation. Nevertheless, the conventional search engines like Google are kept as strategies with good reception to do search processes. The following article presents a study of the development and evolution of search engines, more specifically, to analyze the relevance of findings based on the number of results displayed in paging systems with Google as a case study. Finally, it is intended to contribute to indexing criteria in search results, based on an approach to Semantic Web as a stage in the evolution of the Web.
Barry, Chris; Charleton, Debbie
Researchers have identified the Web as a searchers first port of call for locating information. Search Engine Marketing (SEM) strategies have been noted as a key consideration when developing, maintaining and managing Websites. A study presented here of SEM practices of Irish small to medium enterprises (SMEs) reveals they plan to spend more resources on SEM in the future. Most firms utilize an informal SEM strategy, where Website optimization is perceived most effective in attracting traffic. Respondents cite the use of ‘keywords in title and description tags’ as the most used SEM technique, followed by the use of ‘keywords throughout the whole Website’; while ‘Pay for Placement’ was most widely used Paid Search technique. In concurrence with the literature, measuring SEM performance remains a significant challenge with many firms unsure if they measure it effectively. An encouraging finding is that Irish SMEs adopt a positive ethical posture when undertaking SEM.
Pletneva, Natalia; Ruiz de Castaneda, Rafael; Baroz, Frederic; Boyer, Celia
This paper presents the results of a blind comparison of top ten search results retrieved by Google.ch (French) and Khresmoi for everyone, a health specialized search engine. Participants--students of the Faculty of Medicine of the University of Geneva had to complete three tasks and select their preferred results. The majority of the participants have largely preferred Google results while Khresmoi results showed potential to compete in specific topics. The coverage of the results seems to be one of the reasons. The second being that participants do not know how to select quality and transparent health web pages. More awareness, tools and education about the matter is required for the students of Medicine to be able to efficiently distinguish trustworthy online health information.
Xie, Lingxi; Tian, Qi; Zhou, Wengang; Zhang, Bo
State-of-the-art web image search frameworks are often based on the bag-of-visual-words (BoVWs) model and the inverted index structure. Despite the simplicity, efficiency, and scalability, they often suffer from low precision and/or recall, due to the limited stability of local features and the considerable information loss on the quantization stage. To refine the quality of retrieved images, various postprocessing methods have been adopted after the initial search process. In this paper, we investigate the online querying process from a graph-based perspective. We introduce a heterogeneous graph model containing both image and feature nodes explicitly, and propose an efficient reranking approach consisting of two successive modules, i.e., incremental query expansion and image-feature voting, to improve the recall and precision, respectively. Compared with the conventional reranking algorithms, our method does not require using geometric information of visual words, therefore enjoys low consumptions of both time and memory. Moreover, our method is independent of the initial search process, and could cooperate with many BoVW-based image search pipelines, or adopted after other postprocessing algorithms. We evaluate our approach on large-scale image search tasks and verify its competitive search performance.
Finkelstein, Joseph; Bedra, McKenzie
Internet provides access to vast amounts of comprehensive information regarding any health-related subject. Patients increasingly use this information for health education using a search engine to identify education materials. An alternative approach of health education via Internet is based on utilizing a verified web site which provides structured interactive education guided by adult learning theories. Comparison of these two approaches in older patients was not performed systematically. The aim of this study was to compare the efficacy of a web-based computer-assisted education (CO-ED) system versus searching the Internet for learning about hypertension. Sixty hypertensive older adults (age 45+) were randomized into control or intervention groups. The control patients spent 30 to 40 minutes searching the Internet using a search engine for information about hypertension. The intervention patients spent 30 to 40 minutes using the CO-ED system, which provided computer-assisted instruction about major hypertension topics. Analysis of pre- and post- knowledge scores indicated a significant improvement among CO-ED users (14.6%) as opposed to Internet users (2%). Additionally, patients using the CO-ED program rated their learning experience more positively than those using the Internet.
Yang, Y Tony; Horneffer, Michael; DiLisio, Nicole
Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.
Carvalho, Paulo C.; Fischer, Juliana S. G.; Xu, Tao; Cociorva, Daniel; Balbuena, Tiago S.; Valente, Richard H.; Perales, Jonas; Yates, John R.; Barbosa, Valmir C.
The Search Engine Processor (SEPro) is a tool for filtering, organizing, sharing, and displaying peptide spectrum matches. It employs a novel three-tier Bayesian approach that uses layers of spectrum, peptide, and protein logic to lead the data to converge to a single list of reliable protein identifications. SEPro is integrated into the PatternLab for proteomics environment, where an arsenal of tools for analyzing shotgun proteomic data is provided. By using the semi-labeled decoy approach for benchmarking, we show that SEPro significantly outperforms a commercially available competitor. PMID:22311825
Bruce R. Barkstrom
Full Text Available This paper discusses a search paradigm for numerical data in Earth science that relies on the intrinsic structure of an archive's collection. Such non-textual data lies outside the normal textual basis for the Semantic Web. The paradigm tries to bypass some of the difficulties associated with keyword searches, such as semantic heterogeneity. The suggested collection structure uses a hierarchical taxonomy based on multidimensional axes of continuous variables. This structure fits the underlying 'geometry' of Earth science data better than sets of keywords in an ontology. The alternative paradigm views the search as a two-agent cooperative game that uses a dialog between the search engine and the data user. In this view, the search engine knows about the objects in the archive. It cannot read the user's mind to identify what the user needs. We assume the user has a clear idea of the search target. However he or she may not have a clear idea of the archive's contents. The paper suggests how the user interface may provide information to deal with the user's difficulties in understanding items in the dialog.
Gunasekara, Chathura; Subramanian, Avinash; Avvari, Janaki Venkata Ram Kumar; Li, Bin; Chen, Su; Wei, Hairong
Plant biologists frequently need to examine if a sequence motif bound by a specific transcription or translation factor is present in the proximal promoters or 3' untranslated regions (3' UTR) of a set of plant genes of interest. To achieve such a task, plant biologists have to not only identify an appropriate algorithm for motif searching, but also manipulate the large volume of sequence data, making it burdensome to carry out or fulfill. In this study, we developed a web portal that enables plant molecular biologists to search for DNA motifs especially degenerate ones in custom sequences or the flanking regions of all genes in the 50 plant species whose genomes have been sequenced. A web tool like this is demanded to meet a variety of needs of plant biologists for identifying the potential gene regulatory relationships. We implemented a suffix tree algorithm to accelerate the searching process of a group of motifs in a multitude of target genes. The motifs to be searched can be in the degenerate bases in addition to adenine (A), cytosine (C), guanine (G), and thymine (T). The target sequences to be searched can be custom sequences or the selected proximal gene sequences from any one of the 50 sequenced plant species. The web portal also contains the functionality to facilitate the search of motifs that are represented by position probability matrix in above-mentioned species. Currently, the algorithm can accomplish an exhaust search of 100 motifs in 35,000 target sequences of 2 kb long in 4.2 min. However, the runtime may change in the future depending on the space availability, number of running jobs, network traffic, data loading, and output packing and delivery through electronic mailing. A web portal was developed to facilitate searching of motifs presents in custom sequences or the proximal promoters or 3' UTR of 50 plant species with the sequenced genomes. This web tool is accessible by using this URL: http://sys.bio.mtu.edu/motif/index.php.
Full Text Available Search engine optimization (SEO techniques involve „on-page“ and „off-page“ actions taken by web developers and SEO specialists with aim to increase the ranking of web pages in search engine results pages (SERP by following recommendations from major search engine companies. In this paper we explore the possibility of creating a metric for evaluating on-page SEO of a website. A novel „k-rank“ metric is proposed which takes into account not only the presence of certain tags in HTML of a page, but how those tags are used with selected keywords in selected domain. The „k-rank“ is tested in domain of education by inspecting 20 university websites and comparing them with expert scores. The overview of results showed that „k-rank“ can be used as a metric for on-page SEO.
Cavus, Nadire; Alpan, Kezban
The importance of information is increasing in the information age that we are living in with internet becoming the major information resource for people with rapidly increasing number of documents. This situation makes finding information on the internet without web search engines impossible. The aim of the study is revealing most widely used…
Zou, Youyong; Finin, Tim; Chen, Harry
Understanding and using the data and knowledge encoded in semantic web documents requires an inference engine. F-OWL is an inference engine for the semantic web language OWL language based on F-logic, an approach to defining frame-based systems in logic. F-OWL is implemented using XSB and Flora-2 and takes full advantage of their features. We describe how F-OWL computes ontology entailment and compare it with other description logic based approaches. We also describe TAGA, a trading agent environment that we have used as a test bed for F-OWL and to explore how multiagent systems can use semantic web concepts and technology.
Gkinos, Nikolaos-Panagiotis; Triakosaris, Evangelos; Galiotou, Eleni
In this paper, we discuss issues on the representation and search of maritime data on the semantic web. In particular, we are concerned with data related to the requirements put forward by ship registries and should be met in view of a ship's flag acquisition. The choice of a ship's flag is a matter of crucial importance since it is the basis of tax payment and rule and regulation enforcement for the flying of flags. In our approach, flags and corresponding requirements are represented in the form of a ontology. Data "opening" and linking with the use of the ontology in question, will facilitate a ship's flag search based on its particular characteristics and will provide useful information on ship registration requirements.
Zhai, Jie; Shao, Zhiqing; Guo, Yi; Zhang, Haiteng
In our earlier work, we present a novel formal method for the semiautomatic verification of specifications and for describing web service composition components by using abstract concepts. After verification, the instantiations of components were selected to satisfy the complex service performance constraints. However, selecting an optimal instantiation, which comprises different candidate services for each generic service, from a large number of instantiations is difficult. Therefore, we present a new evolutionary approach on the basis of the discrete group search service (D-GSS) model. With regard to obtaining the optimal multiconstraint instantiation of the complex component, the D-GSS model has competitive performance compared with other service selection models in terms of accuracy, efficiency, and ability to solve high-dimensional service composition component problems. We propose the cost function and the discrete group search optimizer (D-GSO) algorithm and study the convergence of the D-GSS model through verification and test cases.
This article explains how nurses can get the most out of researching information on the internet using the search engine Google. It also explores some of the other types of search engines that are available. Internet users are shown how to find text, images and reports and search within sites. Copyright issues are also discussed.
In terms of other criteria such as phrase searching, simple and natural language interface, high quality of display results, these search engines were the best. MetaSearch engines especially MetaCrawler performed the worst in indexing and retrieving scientific literature particularly at UDSM library. There was a significant ...
As search is being used by billions of people, modern search engines are becoming more and more complex. And complexity does not just come from the algorithms. Richer and richer content is being added to search engine result pages: news and sports results, definitions and translations, images and
Zenetti, German; Bijmolt, Tammo H. A.; Leeflang, Peter S. H.; Klapper, Daniel
Search engine advertising has become a multibillion-dollar business and one of the dominant forms of advertising on the Internet. This study examines the effectiveness of search engine advertising within a multimedia campaign, with explicit consideration of the interaction effects between search
Mehrdokht Wazirpour Keshmiri
Full Text Available The main purpose of this research was to compare the scale of web subject directories precision in information retrieval of technical-engineering science. Information gathering was documentary and webometric. Keywords of technical-engineering science were chosen at twenty different subjects from IEEE (Institute of Electrical and Electronics Engineers and engineering magazines that situated in sciencedirect site. These keywords are used at five subject directories Yahoo, Google, Infomine, Intute, Dmoz, that were web directories high-utilization. Usually first results in searching tools are connected to searching keywords. Because, first ten results was evaluated in every search. These assessments to consist of scale of precision, scale of error, scale retrieval items in technical-engineering categories to retrieval items entirely. The used criteria for determining the scale of precision that was according to high-utilization standards in different documents, to consist of presence of the keywords in title, appearance of keywords at the part of web retrieved pages, keywords adjacency, URL of page, page description and subject categories. Information analysis was according to Kruskal-Wallis Test and L.S.D fisher. Results revealed that there was meaningful difference about precision of web subject directories in information retrieval of technical-engineering science, Therefore this theory was confirmed.web subject directories ranked from point of precision as follows. Google, Yahoo, Intute, Dmoz, and Infomine. The scale of observed error at the first results was another criterion that was used for comparing web subject directories. In this research, Yahoo had minimum scale of error and Infomine had most of error. This research also compared the scale of retrieval items in all of categories web subject directories entirely to retrieval items in technical-engineering categories, results revealed that there was meaningful difference between them. And
Full Text Available Using mixed methods research design, the current study has analyzed Iranian researchers’ information searching behaviour on the Web.Then based on extracted concepts, the model of their information searching behavior was revealed. . Forty-four participants, including academic staff from universities and research centers were recruited for this study selected by purposive sampling. Data were gathered from questionnairs including ten questions and semi-structured interview. Each participant’s memos were analyzed using grounded theory methods adapted from Strauss & Corbin (1998. Results showed that the main objectives of subjects were doing a research, writing a paper, studying, doing assignments, downloading files and acquiring public information in using Web. The most important of learning about how to search and retrieve information were trial and error and get help from friends among the subjects. Information resources are identified by searching in information resources (e.g. search engines, references in papers, and search in Online database… communications facilities & tools (e.g. contact with colleagues, seminars & workshops, social networking..., and information services (e.g. RSS, Alerting, and SDI. Also, Findings indicated that searching by search engines, reviewing references, searching in online databases, and contact with colleagues and studying last issue of the electronic journals were the most important for searching. The most important strategies were using search engines and scientific tools such as Google Scholar. In addition, utilizing from simple (Quick search method was the most common among subjects. Using of topic, keywords, title of paper were most important of elements for retrieval information. Analysis of interview showed that there were nine stages in researchers’ information searching behaviour: topic selection, initiating search, formulating search query, information retrieval, access to information
Parreiras, Fernando S
The next enterprise computing era will rely on the synergy between both technologies: semantic web and model-driven software development (MDSD). The semantic web organizes system knowledge in conceptual domains according to its meaning. It addresses various enterprise computing needs by identifying, abstracting and rationalizing commonalities, and checking for inconsistencies across system specifications. On the other side, model-driven software development is closing the gap among business requirements, designs and executables by using domain-specific languages with custom-built syntax and se
De-Arteaga, Maria; Eggel, Ivan; Do, Bao; Rubin, Daniel; Kahn, Charles E; Müller, Henning
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) . In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often
Sharpe, Richard E; Sharpe, Megan; Siegel, Eliot; Siddiqui, Khan
Internet-based search engines have become a significant component of medical practice. Physicians increasingly rely on information available from search engines as a means to improve patient care, provide better education, and enhance research. Specialized search engines have emerged to more efficiently meet the needs of physicians. Details about the ways in which radiologists utilize search engines have not been documented. The authors categorized every 25th search query in a radiology-centric vertical search engine by radiologic subspecialty, imaging modality, geographic location of access, time of day, use of abbreviations, misspellings, and search language. Musculoskeletal and neurologic imagings were the most frequently searched subspecialties. The least frequently searched were breast imaging, pediatric imaging, and nuclear medicine. Magnetic resonance imaging and computed tomography were the most frequently searched modalities. A majority of searches were initiated in North America, but all continents were represented. Searches occurred 24 h/day in converted local times, with a majority occurring during the normal business day. Misspellings and abbreviations were common. Almost all searches were performed in English. Search engine utilization trends are likely to mirror trends in diagnostic imaging in the region from which searches originate. Internet searching appears to function as a real-time clinical decision-making tool, a research tool, and an educational resource. A more thorough understanding of search utilization patterns can be obtained by analyzing phrases as actually entered as well as the geographic location and time of origination. This knowledge may contribute to the development of more efficient and personalized search engines.
Full Text Available The paper presents a web search result clustering algorithm that was integrated in to a desktop application. The application aims to increase the web search engines performances by reducing the user effort in finding a web page in the list of results returned by the search engines.
Dendooven, Tom; Luisi, Ben F
RNA acts not only as an information bearer in the biogenesis of proteins from genes, but also as a regulator that participates in the control of gene expression. In bacteria, small RNA molecules (sRNAs) play controlling roles in numerous processes and help to orchestrate complex regulatory networks. Such processes include cell growth and development, response to stress and metabolic change, transcription termination, cell-to-cell communication, and the launching of programmes for host invasion. All these processes require recognition of target messenger RNAs by the sRNAs. This review summarizes recent results that have provided insights into how bacterial sRNAs are recruited into effector ribonucleoprotein complexes that can seek out and act upon target transcripts. The results hint at how sRNAs and their protein partners act as pattern-matching search engines that efficaciously regulate gene expression, by performing with specificity and speed while avoiding off-target effects. The requirements for efficient searches of RNA patterns appear to be common to all domains of life. © 2017 The Author(s).
Ni Made Ari Lestari
Full Text Available E-commerce is a sale and purchase transactions that occur through electronic systems such as the Internet, WWW, or other computer networks. E-commerce involves electronic data interchange and automated data collection systems. In all e-commerce search engine provided a column for the search items desired by the user. In e-commerce such as Tokopedia, Lazada, MatahariMall, Amazon, and other search engines that provided just use a regular search engine technology. In the usual search engines getting longer sentences from the input or output of goods search results will be more extensive and more. However, by utilizing the semantic indexing technology, the longer and clear input desired goods, the number of searches will be few and accurately in accordance with the input that helps the user in decision making. In this study discussed how to build a search engine on the web e-commerce by using Latent Semantic Indexing. The first starts from the use of Text Mining methods for word processing, and the method Levenshtein Distance to repair automatic word and the last Latent Semantic Indexing for information processing and input expenditure.
Ernesto Rengifo García
Full Text Available Search engines are an important technological tool that facilitates the dissemination and access to information on the Internet. However, when it comes to works protected by authors rights, in the case of continental law, or Copyright, for the Anglo-Saxon tradition, it is difficult to define if search engines infringe the rights of the owners of these works. In the face of this situation, the US and Europe have employed the exceptions to autorights and Fair Use to decide whether search engines infringes owners rights. This article carries out a comparative analysis of the different judicial decisions in the US and Europe on search engines and protected works.
María Isabel Escalona Fernández
Full Text Available The objetive of this study is to offer a clear and practical overview to the Chemical Scientist, applying quantitative methods on the two most extended platforms of scientific information nowadays: Web of Science (Thomson Reuters and Scopus (Elsevier. It was carried out a specialized search in the area of the chemical engineering in both databases between 1999 and 2006. Through the articles recovered in the searches during those ten years and the journals recovered in 2006, it is analyzed the possible correlation between both systems; the pattern of growth that they show and certain parameters of that growth; the overlapping among the two databases, the dispersion of the articles in the journals and concentration measures, between others. The results show the existence of a high likeness between Web of Science and Scopus, turning out complementary but not exclusive, regarding their possible use for the chemical engineers.
Sahin, Abdurrahman; Cermik, Hulya; Dogan, Birsen
Information searching skills have become increasingly important for prospective teachers with the exponential growth of learning materials on the web. This study is an attempt to understand the experiences of prospective teachers with search engines through metaphoric images and to further investigate whether their experiences are related to the…
Doulai, P. [Wollongong Univ., NSW (Australia)
In recent years, the Internet and its new glamour, the World Wide Web (WWW or Web for short), has introduced radical changes in the direction of knowledge and information disseminations world-wide. The Department of Electrical and Computer Engineering at University of Wollongong has recently set up a WWW public domain clearinghouse (Home Page, URL http://www.uow.edu.au/public/pwrsysed/homepage.html) supporting and promoting the use of computers in engineering education in general and electric power education in particular. This paper provides a brief introduction to the Home Page and its five major nodes (sub-pages). It also provides some background information about the World Wide Web and its potential usage for engineering education. (author). 3 figs., 3 refs.
Lai, Kaisheng; Lee, Yan Xin; Chen, Hao; Yu, Rongjun
The widespread use of web searches in daily life has allowed researchers to study people's online social and psychological behavior. Using web search data has advantages in terms of data objectivity, ecological validity, temporal resolution, and unique application value. This review integrates existing studies on web search data that have explored topics including sexual behavior, suicidal behavior, mental health, social prejudice, social inequality, public responses to policies, and other psychosocial issues. These studies are categorized as descriptive, correlational, inferential, predictive, and policy evaluation research. The integration of theory-based hypothesis testing in future web search research will result in even stronger contributions to social psychology.
Using a ranked list as return result of a specific user request is time consuming and the browsing style seems to not be user-friendly. In this paper, we propose to study how to integrate and adapt the Formal Concept Analysis (FCA as a new system for Arabic Web Search Results Clustering based on their hierarchical structure. The effectiveness of our proposed system is illustrated by an experimental study using Arabic comprehensive set of documents from the Open Directory Project hierarchy as benchmark, where we compare our system with two others: Suffix Tree Clustering (STC and Lingo. The comparison focuses on the quality of the clustering results and produced label by different systems. It shows that our system outperforms the two others.
Wang, Meng; Li, Hao; Tao, Dacheng; Lu, Ke; Wu, Xindong
This paper introduces a web image search reranking approach that explores multiple modalities in a graph-based learning scheme. Different from the conventional methods that usually adopt a single modality or integrate multiple modalities into a long feature vector, our approach can effectively integrate the learning of relevance scores, weights of modalities, and the distance metric and its scaling for each modality into a unified scheme. In this way, the effects of different modalities can be adaptively modulated and better reranking performance can be achieved. We conduct experiments on a large dataset that contains more than 1000 queries and 1 million images to evaluate our approach. Experimental results demonstrate that the proposed reranking approach is more robust than using each individual modality, and it also performs better than many existing methods.
Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L; Deutsch, Eric W
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques.
Shteynberg, David; Nesvizhskii, Alexey I.; Moritz, Robert L.; Deutsch, Eric W.
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each fragment ion spectrum in the dataset. There are many different search engines, both commercial and open source, each employing a somewhat different technique for spectrum identification. The set of high-scoring peptide-spectrum matches for a defined set of input spectra differs markedly among the various search engine results; individual engines each provide unique correct identifications among a core set of correlative identifications. This has led to the approach of combining the results from multiple search engines to achieve improved analysis of each dataset. Here we review the techniques and available software for combining the results of multiple search engines and briefly compare the relative performance of these techniques. PMID:23720762
Modi, Minal; Laskar, Nabila; Modi, Bhavik N
To objectively assess the quality of information available on the World Wide Web on cardiac resynchronization therapy (CRT). Patients frequently search the internet regarding their healthcare issues. It has been shown that patients seeking information can help or hinder their healthcare outcomes depending on the quality of information consulted. On the internet, this information can be produced and published by anyone, resulting in the risk of patients accessing inaccurate and misleading information. The search term "Cardiac Resynchronisation Therapy" was entered into the three most popular search engines and the first 50 pages on each were pooled and analyzed, after excluding websites inappropriate for objective review. The "LIDA" instrument (a validated tool for assessing quality of healthcare information websites) was to generate scores on Accessibility, Reliability, and Usability. Readability was assessed using the Flesch Reading Ease Score (FRES). Of the 150 web-links, 41 sites met the eligibility criteria. The sites were assessed using the LIDA instrument and the FRES. A mean total LIDA score for all the websites assessed was 123.5 of a possible 165 (74.8%). The average Accessibility of the sites assessed was 50.1 of 60 (84.3%), on Usability 41.4 of 54 (76.6%), on Reliability 31.5 of 51 (61.7%), and 41.8 on FRES. There was a significant variability among sites and interestingly, there was no correlation between the sites' search engine ranking and their scores. This study has illustrated the variable quality of online material on the topic of CRT. Furthermore, there was also no apparent correlation between highly ranked, popular websites and their quality. Healthcare professionals should be encouraged to guide their patients toward the online material that contains reliable information. © 2016 Wiley Periodicals, Inc.
Schäfer, Micahel; Dolog, Peter; Nejdl, Wolfgang
in such an environment occurs, a compensation can be initiated to recover from the failure. However, current environments have only limited capabilities for compensations, and are usually based on backward recovery. In this paper, we introduce an engineering approach and an environment to deal with advanced...
Svenstrup, Dan; Jørgensen, Henrik L; Winther, Ole
Physicians and the general public are increasingly using web-based tools to find answers to medical questions. The field of rare diseases is especially challenging and important as shown by the long delay and many mistakes associated with diagnoses. In this paper we review recent initiatives on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description of IBM's Watson system and make a rough comparison between findzebra.com and Watson on subsets of the Doctor's dilemma dataset. The recall@10 and recall@20 (fraction of cases where the correct result appears in top 10 and top 20) for the 56 cases are found to be be 29%, 16%, 27% and 59% and 32%, 18%, 34% and 64%, respectively. Thus, FindZebra has a significantly (p < 0.01) higher recall than the other 3 search engines. When tested under the same conditions, Watson and FindZebra showed similar recall@10 accuracy. However, the tests were performed on different subsets of Doctors dilemma questions. Advances in technology and access to high quality data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise.
Esch, Maria; Chen, Jinbo; Colmsee, Christian; Klapperstück, Matthias; Grafahrend-Belau, Eva; Scholz, Uwe; Lange, Matthias
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Petcu, Paula; Dragusin, Radu
Based on previous experience from working on a task-based search engine, we present a list of suggestions and ideas for an Information Retrieval (IR) framework that could inform the development of next generation professional search systems. The specific task that we start from is the clinicians......' information need in finding rare disease diagnostic hypotheses at the time and place where medical decisions are made. Our experience from the development of a search engine focused on supporting clinicians in completing this task has provided us valuable insights in what aspects should be considered...... by the developers of vertical search engines....
Rastegar-Mojarad, Majid; Kadolph, Christopher; Ye, Zhan; Wall, Daniel; Murali, Narayana; Lin, Simon
A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names. The Marshfield Clinic website provides a search engine for users to search for physicians' names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster. Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: "Typographic", "Phonetic spelling variation", and "Nickname". To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data. Using the "Challenge Data Set of Marshfield Physician Names," we evaluated the accuracy of fuzzy-match engine-top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine-top one (71%). We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.
Dokas, I.M.; Alapetite, Alexandre
raised their complexity. Unfortunately, there is so far no clear answer to the question: How may the methods and experience of Web engineering and expert systems be combined and applied in order todevelop effective and successful Web based expert systems? In an attempt to answer this question......, a development process meta-model for Web based expert systems will be presented. Based on this meta-model, a publicly available Web based expert systemcalled Landfill Operation Management Advisor (LOMA) was developed. In addition, the results of an accessibility evaluation on LOMA – the first ever reported...... on Web based expert systems – will be presented. The idea behind the presentation of theaccessibility evaluation and its conclusions is to show to Web based expert system developers, who typically have little Web engineering background, that Web engineering issues must be considered when developing Web...
Walcott, Brian P; Nahed, Brian V; Kahle, Kristopher T; Redjal, Navid; Coumans, Jean-Valery
Previous methods to determine stroke prevalence, such as nationwide surveys, are labor-intensive endeavors. Recent advances in search engine query analytics have led to a new metric for disease surveillance to evaluate symptomatic phenomenon, such as influenza. The authors hypothesized that the use of search engine query data can determine the prevalence of stroke. The Google Insights for Search database was accessed to analyze anonymized search engine query data. The authors' search strategy utilized common search queries used when attempting either to identify the signs and symptoms of a stroke or to perform stroke education. The search logic was as follows: (stroke signs + stroke symptoms + mini stroke--heat) from January 1, 2005, to December 31, 2010. The relative number of searches performed (the interest level) for this search logic was established for all 50 states and the District of Columbia. A Pearson product-moment correlation coefficient was calculated from the statespecific stroke prevalence data previously reported. Web search engine interest level was available for all 50 states and the District of Columbia over the time period for January 1, 2005-December 31, 2010. The interest level was highest in Alabama and Tennessee (100 and 96, respectively) and lowest in California and Virginia (58 and 53, respectively). The Pearson correlation coefficient (r) was calculated to be 0.47 (p = 0.0005, 2-tailed). Search engine query data analysis allows for the determination of relative stroke prevalence. Further investigation will reveal the reliability of this metric to determine temporal pattern analysis and prevalence in this and other symptomatic diseases.
Full Text Available Purpose: This study explores how search motivation and context influence mobile Web search behaviors. Design/methodology/approach: We studied 30 experienced mobile Web users via questionnaires, semi-structured interviews, and an online diary tool that participants used to record their daily search activities. SQLite Developer was used to extract data from the users' phone logs for correlation analysis in Statistical Product and Service Solutions (SPSS. Findings: One quarter of mobile search sessions were driven by two or more search motivations. It was especially difficult to distinguish curiosity from time killing in particular user reporting. Multi-dimensional contexts and motivations influenced mobile search behaviors, and among the context dimensions, gender, place, activities they engaged in while searching, task importance, portal, and interpersonal relations (whether accompanied or alone when searching correlated with each other. Research limitations: The sample was comprised entirely of college students, so our findings may not generalize to other populations. More participants and longer experimental duration will improve the accuracy and objectivity of the research. Practical implications: Motivation analysis and search context recognition can help mobile service providers design applications and services for particular mobile contexts and usages. Originality/value: Most current research focuses on specific contexts, such as studies on place, or other contextual influences on mobile search, and lacks a systematic analysis of mobile search context. Based on analysis of the impact of mobile search motivations and search context on search behaviors, we built a multi-dimensional model of mobile search behaviors.
Fagan, Jody Condit
This two-part article considers how well some of today's search tools support scholars' work. The first part of the article reviewed Google Scholar and Microsoft Academic Search using a modified version of Carole L. Palmer, Lauren C. Teffeau, and Carrier M. Pirmann's framework (2009). Microsoft Academic Search is a strong contender when…
Konrad Tollmar; Ted Möller; Björn Nilsved
Using images of objects as queries is a new approach to search for information on the Web. Image-based information retrieval goes beyond only matching images, as information in other modalities also can be extracted from data collections using an image search. We have developed a new system that uses images to search for web-based information. This paper has a particular focus on exploring users' experience of general mobile image-based web searches to find what issues and phenomena it contai...
Toluwase Victor Asubiaro
Full Text Available This paper aims to find out the possible effect of the use or nonuse of diacritics in Yoruba search queries on the performance of major search engines, AOL, Bing, Google and Yahoo!, in retrieving documents. 30 Yoruba queries created from the most searched keywords from Nigeria on Google search logs were submitted to the search engines. The search queries were posed to the search engines without diacritics and then with diacritics. All of the search engines retrieved more sites in response to the queries without diacritics. Also, they all retrieved more precise results for queries without diacritics. The search engines also answered more queries without diacritics. There was no significant difference in the precision values of any two of the four search engines for diacritized and undiacritized queries. There was a significant difference in the effectiveness of AOL and Yahoo when diacritics were applied and when they were not applied. The findings of the study indicate that the search engines do not find a relationship between the diacritized Yoruba words and the undiacritized versions. Therefore, there is a need for search engines to add normalization steps to pre-process Yoruba queries and indexes. This study concentrates on a problem with search engines that has not been previously investigated.
Full Text Available The 21st century is marked by the extensive and easy access to information through the virtual environment. Do we find in today's Romanian school the presence of a formative space - on the one hand, facilitator for a maximal exploitation of opportunities, and on the other hand, a "sensor" for new risks, characteristic to the information era? Is the "digital generation" (Mark Prensky of the beginning of century in Romania ready from these perspectives? The present paper outlines the results of a comparative exploratory study regarding the ordinary methods used by youngsters - from 5th and 6th grades, as well as 11th and 12th grades, from six different schools, high-schools and colleges from Dolj county – to find information about different topics/homework. The results offer the premises for hypothesis regarding this phenomenon at national level. The conclusions indicate as the main method of obtaining information the web-searching. They emphasize the absence of an initial specific educational training in this domain and allow the delineation of a suggestive image regarding possible future methods of action.
Cutrone, M; Grimalt, R
Atlases on CD-ROM first substituted the use of paediatric dermatology atlases printed on paper. This permitted a faster search and a practical comparison of differential diagnoses. The third step in the evolution of clinical atlases was the onset of the online atlas. Many doctors now use the Internet image search engines to obtain clinical images directly. The aim of this study was to test the reliability of the image search engines compared to the online atlases. We tested seven Internet image search engines with three paediatric dermatology diseases. In general, the service offered by the search engines is good, and continues to be free of charge. The coincidence between what we searched for and what we found was generally excellent, and contained no advertisements. Most Internet search engines provided similar results but some were more user friendly than others. It is not necessary to repeat the same research with Picsearch, Lycos and MSN, as the response would be the same; there is a possibility that they might share software. Image search engines are a useful, free and precise method to obtain paediatric dermatology images for teaching purposes. There is still the matter of copyright to be resolved. What are the legal uses of these 'free' images? How do we define 'teaching purposes'? New watermark methods and encrypted electronic signatures might solve these problems and answer these questions.
Orduna-Malea, Enrique; Ayllon, Juan Manuel; Martin-Martin, Alberto; Lopez-Cozar, Emilio Delgado
The goal of this working paper is to summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engines. Additionally, selfelaborated data is offered, which are intended to provide current evidence about the popul...
Pasche, Emilie; Gobeill, Julien; Kreim, Olivier; Oezdemir-Zaech, Fatma; Vachon, Therese; Lovis, Christian; Ruch, Patrick
The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development. We describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted. The optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report. We have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks
Zhang, Lu; Du, Hongru; Zhao, Yannan; Wu, Rongwei; Zhang, Xiaolei
"The Belt and Road" initiative has been expected to facilitate interactions among numerous city centers. This initiative would generate a number of centers, both economic and political, which would facilitate greater interaction. To explore how information flows are merged and the specific opportunities that may be offered, Chinese cities along "the Belt and Road" are selected for a case study. Furthermore, urban networks in cyberspace have been characterized by their infrastructure orientation, which implies that there is a relative dearth of studies focusing on the investigation of urban hierarchies by capturing information flows between Chinese cities along "the Belt and Road". This paper employs Baidu, the main web search engine in China, to examine urban hierarchies. The results show that urban networks become more balanced, shifting from a polycentric to a homogenized pattern. Furthermore, cities in networks tend to have both a hierarchical system and a spatial concentration primarily in regions such as Beijing-Tianjin-Hebei, Yangtze River Delta and the Pearl River Delta region. Urban hierarchy based on web search activity does not follow the existing hierarchical system based on geospatial and economic development in all cases. Moreover, urban networks, under the framework of "the Belt and Road", show several significant corridors and more opportunities for more cities, particularly western cities. Furthermore, factors that may influence web search activity are explored. The results show that web search activity is significantly influenced by the economic gap, geographical proximity and administrative rank of the city.
Full Text Available "The Belt and Road" initiative has been expected to facilitate interactions among numerous city centers. This initiative would generate a number of centers, both economic and political, which would facilitate greater interaction. To explore how information flows are merged and the specific opportunities that may be offered, Chinese cities along "the Belt and Road" are selected for a case study. Furthermore, urban networks in cyberspace have been characterized by their infrastructure orientation, which implies that there is a relative dearth of studies focusing on the investigation of urban hierarchies by capturing information flows between Chinese cities along "the Belt and Road". This paper employs Baidu, the main web search engine in China, to examine urban hierarchies. The results show that urban networks become more balanced, shifting from a polycentric to a homogenized pattern. Furthermore, cities in networks tend to have both a hierarchical system and a spatial concentration primarily in regions such as Beijing-Tianjin-Hebei, Yangtze River Delta and the Pearl River Delta region. Urban hierarchy based on web search activity does not follow the existing hierarchical system based on geospatial and economic development in all cases. Moreover, urban networks, under the framework of "the Belt and Road", show several significant corridors and more opportunities for more cities, particularly western cities. Furthermore, factors that may influence web search activity are explored. The results show that web search activity is significantly influenced by the economic gap, geographical proximity and administrative rank of the city.
Searching is a fundamental task to support research. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. This work provides a system for combining the classical keyword-based search engines with semantic annotation. Conventi...
Sallaberry, Arnaud; Zaidi, Faraz; Pich, C.; Melançon, Guy
International audience; With the information overload on the Internet, organization and visualization of web search results so as to facilitate faster access to information is a necessity. The classical methods present search results as an ordered list of web pages ranked in terms of relevance to the searched topic. Users thus have to scan text snippets or navigate through various pages before finding the required information. In this paper we present an interactive visualization system for c...
Divita, Guy; Workman, T Elizabeth; Carter, Marjorie E; Redd, Andrew; Samore, Matthew H; Gundlapalli, Adi V
Medical text contains boilerplated content, an artifact of pull-down forms from EMRs. Boilerplated content is the source of challenges for concept extraction on clinical text. This paper introduces PlateRunner, a search engine on boilerplates from the US Department of Veterans Affairs (VA) EMR. Boilerplates containing concepts should be identified and reviewed to recognize challenging formats, identify high yield document titles, and fine tune section zoning. This search engine has the capability to filter negated and asserted concepts, save and search query results. This tool can save queries, search results, and documents found for later analysis.
Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...
Argenton, C.; Prüfer, J.
The market for Internet search is not only economically and socially important, it is also highly concentrated. Is this a problem? We study the question whether "competition is only a free click away". We argue that the market for Internet search is characterized by indirect network externalities
Search tips: Search terms are case-insensitive; Common words are ignored; By default only articles containing all terms in the query are returned (i.e., AND is implied); Combine multiple words with OR to find articles containing either term; e.g., education OR research; Use parentheses to create more complex queries; e.g., ...
Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae
The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
Scott, David W.
The Mission Operations Laboratory (MOL) at Marshall Space Flight Center (MSFC) is responsible for Engineering Support capability for NASA s Ares rocket development and operations. In pursuit of this, MOL is building the Ares Engineering and Operations Network (AEON), a web-based portal to support and simplify two critical activities: Access and analyze Ares manufacturing, test, and flight performance data, with access to Shuttle data for comparison Establish and maintain collaborative communities within the Ares teams/subteams and with other projects, e.g., Space Shuttle, International Space Station (ISS). AEON seeks to provide a seamless interface to a) locally developed engineering applications and b) a Commercial-Off-The-Shelf (COTS) collaborative environment that includes Web 2.0 capabilities, e.g., blogging, wikis, and social networking. This paper discusses how Web 2.0 might be applied to the typically conservative engineering support arena, based on feedback from Integration, Verification, and Validation (IV&V) testing and on searching for their use in similar environments.
Full Text Available The vast availability of information, that added in a very fast pace, in the data repositories creates a challenge in extracting correct and accurate information. Which has increased the competition among developers in order to gain access to technology that seeks to understand the intent researcher and contextual meaning of terms. While the competition for developing an Arabic Semantic Search systems are still in their infancy, and the reason could be traced back to the complexity of Arabic Language. It has a complex morphological, grammatical and semantic aspects, as it is a highly inflectional and derivational language. In this paper, we try to highlight and present an Ontological Search Engine called IBRI-CASONTO for Colleges of Applied Sciences, Oman. Our proposed engine supports both Arabic and English language. It is also employed two types of search which are a keyword-based search and a semantics-based search. IBRI-CASONTO is based on different technologies such as Resource Description Framework (RDF data and Ontological graph. The experiments represent in two sections, first it shows a comparison among Entity-Search and the Classical-Search inside the IBRI-CASONTO itself, second it compares the Entity-Search of IBRI-CASONTO with currently used search engines, such as Kngine, Wolfram Alpha and the most popular engine nowadays Google, in order to measure their performance and efficiency.
Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new "designability"-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).
Shteynberg, David; Nesvizhskii, Alexey I; Moritz, Robert L; Deutsch, Eric W
A crucial component of the analysis of shotgun proteomics datasets is the search engine, an algorithm that attempts to identify the peptide sequence from the parent molecular ion that produced each...
Full Text Available Today, engineering has become a disciplined field. The demand in food products caused the agricultural engineers to consider the matter in a different way. This consideration led the engineer to resolve the biological issues together with electronic and information disciplines and also advanced con trol, advanced technological materials and developed sensor systems. The subject has persuaded them to design solutions for problems related with living things and their environment. Bio - system engineering which has been developed for this purpose has beco me the application of technical knowledge aiming to fulfill the human requirements. The pursuit of bio - system engineering discipline are automation, new developed technologies, information technologies and human interaction, sensitive agriculture techniqu es, power and work machines, product technologies after harvest, structures and relation with environment, animal production technology, soil and water sources, rural development and planning. Bio - system engineering which covers such a wide area should re ach the solution by using its system engineering feature first and then determine the process parameters of the subjects that it resolves. Therefore it has to attribute the reason - result relation in every stage to quality parameters. Therefore, in this a nnouncement, the quality issues necessary for explaining the subjects dealt in bio - system engineering basis are examined one by one and solution models are created depending on these issues.
Zhang, Shu; Chen, Xinrong; Luo, Changshou
Since network has been created and developing rapidly in recent years, the age of information exploding is coming. The Search Engine becomes more and more important for people, but the traditional search engine retrieves and provides information just according to the keywords that users input. How to recommend the right information to users has become the hot point. The technology of personalized information filtering brings people hope .The paper I present analyzed the achievements of those filtering technologies ,and adopted user-system complex-operating modeling to build User-activitycollecting module, User-interest-updating module and User-searching module, in order to meet theme-oriented searching's needs. Experiments showed that the user-interest model can provide personalized service and enhance search engine's precision.
Montesi, M.; Navarrete, T.
This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the
Wang, Yifan; Wu, Lingdan; Luo, Liang; Zhang, Yifen; Dong, Guangheng
The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day's training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day's Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines.
Full Text Available The Internet search engines, which have powerful search/sort functions and ease of use features, have become an indispensable tool for many individuals. The current study is to test whether the short-term Internet search training can make people more dependent on it. Thirty-one subjects out of forty subjects completed the search training study which included a pre-test, a six-day's training of Internet search, and a post-test. During the pre- and post- tests, subjects were asked to search online the answers to 40 unusual questions, remember the answers and recall them in the scanner. Un-learned questions were randomly presented at the recalling stage in order to elicited search impulse. Comparing to the pre-test, subjects in the post-test reported higher impulse to use search engines to answer un-learned questions. Consistently, subjects showed higher brain activations in dorsolateral prefrontal cortex and anterior cingulate cortex in the post-test than in the pre-test. In addition, there were significant positive correlations self-reported search impulse and brain responses in the frontal areas. The results suggest that a simple six-day's Internet search training can make people dependent on the search tools when facing unknown issues. People are easily dependent on the Internet search engines.
Full Text Available Abstract Background Critically Appraised Topics (CATs are a useful tool that helps physicians to make clinical decisions as the healthcare moves towards the practice of Evidence-Based Medicine (EBM. The fast growing World Wide Web has provided a place for physicians to share their appraised topics online, but an increasing amount of time is needed to find a particular topic within such a rich repository. Methods A web-based application, namely the CAT Crawler, was developed by Singapore's Bioinformatics Institute to allow physicians to adequately access available appraised topics on the Internet. A meta-search engine, as the core component of the application, finds relevant topics following keyword input. The primary objective of the work presented here is to evaluate the quantity and quality of search results obtained from the meta-search engine of the CAT Crawler by comparing them with those obtained from two individual CAT search engines. From the CAT libraries at these two sites, all possible keywords were extracted using a keyword extractor. Of those common to both libraries, ten were randomly chosen for evaluation. All ten were submitted to the two search engines individually, and through the meta-search engine of the CAT Crawler. Search results were evaluated for relevance both by medical amateurs and professionals, and the respective recall and precision were calculated. Results While achieving an identical recall, the meta-search engine showed a precision of 77.26% (±14.45 compared to the individual search engines' 52.65% (±12.0 (p Conclusion The results demonstrate the validity of the CAT Crawler meta-search engine approach. The improved precision due to inherent filters underlines the practical usefulness of this tool for clinicians.
Dong, Peng; Wong, Ling Ling; Ng, Sarah; Loh, Marie; Mondry, Adrian
Background Critically Appraised Topics (CATs) are a useful tool that helps physicians to make clinical decisions as the healthcare moves towards the practice of Evidence-Based Medicine (EBM). The fast growing World Wide Web has provided a place for physicians to share their appraised topics online, but an increasing amount of time is needed to find a particular topic within such a rich repository. Methods A web-based application, namely the CAT Crawler, was developed by Singapore's Bioinformatics Institute to allow physicians to adequately access available appraised topics on the Internet. A meta-search engine, as the core component of the application, finds relevant topics following keyword input. The primary objective of the work presented here is to evaluate the quantity and quality of search results obtained from the meta-search engine of the CAT Crawler by comparing them with those obtained from two individual CAT search engines. From the CAT libraries at these two sites, all possible keywords were extracted using a keyword extractor. Of those common to both libraries, ten were randomly chosen for evaluation. All ten were submitted to the two search engines individually, and through the meta-search engine of the CAT Crawler. Search results were evaluated for relevance both by medical amateurs and professionals, and the respective recall and precision were calculated. Results While achieving an identical recall, the meta-search engine showed a precision of 77.26% (±14.45) compared to the individual search engines' 52.65% (±12.0) (p search engine approach. The improved precision due to inherent filters underlines the practical usefulness of this tool for clinicians. PMID:15588311
Paulo, Joao A.
Background Analysis of large datasets produced by mass spectrometry-based proteomics relies on database search algorithms to sequence peptides and identify proteins. Several such scoring methods are available, each based on different statistical foundations and thereby not producing identical results. Here, the aim is to compare peptide and protein identifications using multiple search engines and examine the additional proteins gained by increasing the number of technical replicate analyses. Methods A HeLa whole cell lysate was analyzed on an Orbitrap mass spectrometer for 10 technical replicates. The data were combined and searched using Mascot, SEQUEST, and Andromeda. Comparisons were made of peptide and protein identifications among the search engines. In addition, searches using each engine were performed with incrementing number of technical replicates. Results The number and identity of peptides and proteins differed across search engines. For all three search engines, the differences in proteins identifications were greater than the differences in peptide identifications indicating that the major source of the disparity may be at the protein inference grouping level. The data also revealed that analysis of 2 technical replicates can increase protein identifications by up to 10-15%, while a third replicate results in an additional 4-5%. Conclusions The data emphasize two practical methods of increasing the robustness of mass spectrometry data analysis. The data show that 1) using multiple search engines can expand the number of identified proteins (union) and validate protein identifications (intersection), and 2) analysis of 2 or 3 technical replicates can substantially expand protein identifications. Moreover, information can be extracted from a dataset by performing database searching with different engines and performing technical repeats, which requires no additional sample preparation and effectively utilizes research time and effort. PMID:25346847
Kulp, W. D.; Wood, J. L.
The Georgia Tech Nuclear Data Search Engine (GTNDSE) is a perl script developed to assist in extracting nuclear data from a database of ENSDF-formatted A-chain data files. Operation of the search engine will be demonstrated with results from studies of horizontal systematics across the nuclear mass surface, including B(E2) values for all doubly-even nuclei from the database ( ˜3100 measured values).
Hansen, Niels Dalum; Lioma, Christina; Mølbak, Kåre
We present a method that uses ensemble learning to combine clinical and web-mined time-series data in order to predict future vaccination uptake. The clinical data is official vaccination registries, and the web data is query frequencies collected from Google Trends. Experiments with official...... vaccine records show that our method predicts vaccination uptake eff?ectively (4.7 Root Mean Squared Error). Whereas performance is best when combining clinical and web data, using solely web data yields comparative performance. To our knowledge, this is the ?first study to predict vaccination uptake...
Blair, Irene V; Urland, Geoffrey R; Ma, Jennifer E
The present research investigated Internet search engines as a rapid, cost-effective alternative for estimating word frequencies. Frequency estimates for 382 words were obtained and compared across four methods: (1) Internet search engines, (2) the Kucera and Francis (1967) analysis of a traditional linguistic corpus, (3) the CELEX English linguistic database (Baayen, Piepenbrock, & Gulikers, 1995), and (4) participant ratings of familiarity. The results showed that Internet search engines produced frequency estimates that were highly consistent with those reported by Kucera and Francis and those calculated from CELEX, highly consistent across search engines, and very reliable over a 6-month period of time. Additional results suggested that Internet search engines are an excellent option when traditional word frequency analyses do not contain the necessary data (e.g., estimates for forenames and slang). In contrast, participants' familiarity judgments did not correspond well with the more objective estimates of word frequency. Researchers are advised to use search engines with large databases (e.g., AltaVista) to ensure the greatest representativeness of the frequency estimates.
Kapitány-Fövény, Máté; Demetrovics, Zsolt
With growing access to the Internet, people who use drugs and traffickers started to obtain information about novel psychoactive substances (NPS) via online platforms. This paper aims to analyze whether a decreasing Web interest in formerly banned substances-cocaine, heroin, and MDMA-and the legislative status of mephedrone predict Web interest about this NPS. Google Trends was used to measure changes of Web interest on cocaine, heroin, MDMA, and mephedrone. Google search results for mephedrone within the same time frame were analyzed and categorized. Web interest about classic drugs found to be more persistent. Regarding geographical distribution, location of Web searches for heroin and cocaine was less centralized. Illicit status of mephedrone was a negative predictor of its Web search query rates. The connection between mephedrone-related Web search rates and legislative status of this substance was significantly mediated by ecstasy-related Web search queries, the number of documentaries, and forum/blog entries about mephedrone. The results might provide support for the hypothesis that mephedrone's popularity was highly correlated with its legal status as well as it functioned as a potential substitute for MDMA. Google Trends was found to be a useful tool for testing theoretical assumptions about NPS. Copyright © 2017 John Wiley & Sons, Ltd.
van Dijck, J.
This article argues that search engines in general, and Google Scholar in particular, have become significant co-producers of academic knowledge. Knowledge is not simply conveyed to users, but is co-produced by search engines’ ranking systems and profiling systems, none of which are open to the
van Eijk, N.; Preissl, B.; Haucap, J.; Curwen, P.
The core function of a search engine is to make content and sources of information easily accessible (although the search results themselves may actually include parts of the underlying information). In an environment with unlimited amounts of information available on open platforms such as the
Full Text Available We present the motivation and our concept of introducing "Web Engineering" as a specialization of our "Software Engineering" curriculum. Our main focus lies on the differences in project management education for both areas as well as the necessary process models and tools. First we discuss the principal differences of software project management and web project management, focusing on the main difficulties of teaching such management skills to primarily technophile students. Then we analyze the composition of modern software development teams and changes within such teams implied by the development of web applications. We illustrate this transition showing how a merely document-driven process - as can be found in many traditional software development projects - is turned into a highly tool-supported, agile development process, which is characteristic for web development projects.
Naito, Yuki; Bono, Hidemasa
GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.
Sampson, Margaret; Barrowman, Nicholas J; Moher, David; Clifford, Tammy J; Platt, Robert W; Morrison, Andra; Klassen, Terry P; Zhang, Li
Most electronic search efforts directed at identifying primary studies for inclusion in systematic reviews rely on the optimal Boolean search features of search interfaces such as DIALOG and Ovid. Our objective is to test the ability of an Ultraseek search engine to rank MEDLINE records of the included studies of Cochrane reviews within the top half of all the records retrieved by the Boolean MEDLINE search used by the reviewers. Collections were created using the MEDLINE bibliographic records of included and excluded studies listed in the review and all records retrieved by the MEDLINE search. Records were converted to individual HTML files. Collections of records were indexed and searched through a statistical search engine, Ultraseek, using review-specific search terms. Our data sources, systematic reviews published in the Cochrane library, were included if they reported using at least one phase of the Cochrane Highly Sensitive Search Strategy (HSSS), provided citations for both included and excluded studies and conducted a meta-analysis using a binary outcome measure. Reviews were selected if they yielded between 1000-6000 records when the MEDLINE search strategy was replicated. Nine Cochrane reviews were included. Included studies within the Cochrane reviews were found within the first 500 retrieved studies more often than would be expected by chance. Across all reviews, recall of included studies into the top 500 was 0.70. There was no statistically significant difference in ranking when comparing included studies with just the subset of excluded studies listed as excluded in the published review. The relevance ranking provided by the search engine was better than expected by chance and shows promise for the preliminary evaluation of large results from Boolean searches. A statistical search engine does not appear to be able to make fine discriminations concerning the relevance of bibliographic records that have been pre-screened by systematic reviewers.
Full Text Available The need for web literacy has become increasingly important with the exponential growth of learning materials on the web that are freely accessible to educators. Teachers need the skills to locate these tools and also the ability to teach their students web search strategies and evaluation of websites so they can effectively explore the web by themselves. This study examined the web searching strategies of 253 teachers-in-training using both a survey (247 participants and live screen capture with think aloud audio recording (6 participants. The results present a picture of the strategic, syntactic, and evaluative search abilities of these students that librarians and faculty can use to plan how instruction can target information skill deficits in university student populations.
Edelstein, Michael; Wallensten, Anders; Zetterqvist, Inga; Hulth, Anette
Norovirus outbreaks severely disrupt healthcare systems. We evaluated whether Websök, an internet-based surveillance system using search engine data, improved norovirus surveillance and response in Sweden. We compared Websök users' characteristics with the general population, cross-correlated weekly Websök searches with laboratory notifications between 2006 and 2013, compared the time Websök and laboratory data crossed the epidemic threshold and surveyed infection control teams about their perception and use of Websök. Users of Websök were not representative of the general population. Websök correlated with laboratory data (b = 0.88-0.89) and gave an earlier signal to the onset of the norovirus season compared with laboratory-based surveillance. 17/21 (81%) infection control teams answered the survey, of which 11 (65%) believed Websök could help with infection control plans. Websök is a low-resource, easily replicable system that detects the norovirus season as reliably as laboratory data, but earlier. Using Websök in routine surveillance can help infection control teams prepare for the yearly norovirus season. PMID:24955857
Edelstein, Michael; Wallensten, Anders; Zetterqvist, Inga; Hulth, Anette
Norovirus outbreaks severely disrupt healthcare systems. We evaluated whether Websök, an internet-based surveillance system using search engine data, improved norovirus surveillance and response in Sweden. We compared Websök users' characteristics with the general population, cross-correlated weekly Websök searches with laboratory notifications between 2006 and 2013, compared the time Websök and laboratory data crossed the epidemic threshold and surveyed infection control teams about their perception and use of Websök. Users of Websök were not representative of the general population. Websök correlated with laboratory data (b = 0.88-0.89) and gave an earlier signal to the onset of the norovirus season compared with laboratory-based surveillance. 17/21 (81%) infection control teams answered the survey, of which 11 (65%) believed Websök could help with infection control plans. Websök is a low-resource, easily replicable system that detects the norovirus season as reliably as laboratory data, but earlier. Using Websök in routine surveillance can help infection control teams prepare for the yearly norovirus season.
Till, Benedikt; Niederkrotenthaler, Thomas
The Internet provides a variety of resources for individuals searching for suicide-related information. Structured content-analytic approaches to assess intercultural differences in web contents retrieved with method-related and help-related searches are scarce. We used the 2 most popular search engines (Google and Yahoo/Bing) to retrieve US-American and Austrian search results for the term suicide, method-related search terms (e.g., suicide methods, how to kill yourself, painless suicide, how to hang yourself), and help-related terms (e.g., suicidal thoughts, suicide help) on February 11, 2013. In total, 396 websites retrieved with US search engines and 335 websites from Austrian searches were analyzed with content analysis on the basis of current media guidelines for suicide reporting. We assessed the quality of websites and compared findings across search terms and between the United States and Austria. In both countries, protective outweighed harmful website characteristics by approximately 2:1. Websites retrieved with method-related search terms (e.g., how to hang yourself) contained more harmful (United States: P Austria: P Austria: P Austria: P Austria: P < .05). The quality of suicide-related websites obtained depends on the search terms used. Preventive efforts to improve the ranking of preventive web content, particularly regarding method-related search terms, seem necessary. © Copyright 2014 Physicians Postgraduate Press, Inc.
Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick
Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.
In this chapter, the analysis suggests that Northern Irish terrorists are not visible on Web search engines when net users employ conventional Internet search techniques. Editors of mass media organisations traditionally have had the ability to decide whether a terrorist atrocity is `newsworthy,' controlling the `oxygen' supply that sustains all forms of terrorism. This process, also known as `gatekeeping,' is often influenced by the norms of social responsibility, or alternatively, with regard to the interests of the advertisers and corporate sponsors that sustain mass media organisations. The analysis presented in this chapter suggests that Internet search engines can also be characterised as `gatekeepers,' albeit without the ability to shape the content of Websites before it reaches net users. Instead, Internet search engines give priority retrieval to certain Websites within their directory, pointing net users towards these Websites rather than others on the Internet. Net users are more likely to click on links to the more `visible' Websites on Internet search engine directories, these sites invariably being the highest `ranked' in response to a particular search query. A number of factors including the design of the Website and the number of links to external sites determine the `visibility' of a Website on Internet search engines. The study suggests that Northern Irish terrorists and their sympathisers are unlikely to achieve a greater degree of `visibility' online than they enjoy in the conventional mass media through the perpetration of atrocities. Although these groups may have a greater degree of freedom on the Internet to publicise their ideologies, they are still likely to be speaking to the converted or members of the press. Although it is easier to locate Northern Irish terrorist organisations on Internet search engines by linking in via ideology, ideological description searches, such as `Irish Republican' and `Ulster Loyalist,' are more likely to
Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database.
Garnerone, Silvano; Zanardi, Paolo; Lidar, Daniel A
We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in "q-sampling" protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.
Garnerone, Silvano; Zanardi, Paolo; Lidar, Daniel A.
We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in “q-sampling” protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.
The use of hacking tools by law enforcement to pursue criminal suspects who have anonymized their communications on the dark web presents a looming flashpoint between criminal procedure and international law...
Sari, T.; Kefali, A.
International audience; This paper is an attempt for indexing and searching degraded document images without recognizing the textual patterns and so to circumvent the cost and the laborious effort of OCR technology. The proposed approach deal with textual-dominant documents either handwritten or printed. From preprocessing and segmentation stages, all the connected components (CC) of the text are extracted applying a bottom-up approach. Each CC is then represented with global indices such as ...
Vaudel, Marc; Breiter, Daniela; Beck, Florian; Rahnenführer, Jörg; Martens, Lennart; Zahedi, René P
While peptides carrying PTMs are routinely identified in gel-free MS, the localization of the PTMs onto the peptide sequences remains challenging. Search engine scores of secondary peptide matches have been used in different approaches in order to infer the quality of site inference, by penalizing the localization whenever the search engine similarly scored two candidate peptides with different site assignments. In the present work, we show how the estimation of posterior error probabilities for peptide candidates allows the estimation of a PTM score called the D-score, for multiple search engine studies. We demonstrate the applicability of this score to three popular search engines: Mascot, OMSSA, and X!Tandem, and evaluate its performance using an already published high resolution data set of synthetic phosphopeptides. For those peptides with phosphorylation site inference uncertainty, the number of spectrum matches with correctly localized phosphorylation increased by up to 25.7% when compared to using Mascot alone, although the actual increase depended on the fragmentation method used. Since this method relies only on search engine scores, it can be readily applied to the scoring of the localization of virtually any modification at no additional experimental or in silico cost. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Suryani, Mira; Hadi, Setiawan; Paulus, Erick; Nurma Yulita, Intan; Supriatna, Asep K.
Today, Information and Communication Technology (ICT) has become a regular thing for every aspect of live include cultural and heritage aspect. Sundanese ancient manuscripts as Sundanese heritage are in damage condition and also the information that containing on it. So in order to preserve the information in Sundanese ancient manuscripts and make them easier to search, a search engine has been developed. The search engine must has good computing ability. In order to get the best computation in developed search engine, three types of probabilistic approaches: Bayesian Networks Model, Divergence from Randomness with PL2 distribution, and DFR-PL2F as derivative form DFR-PL2 have been compared in this study. The three probabilistic approaches supported by index of documents and three different weighting methods: term occurrence, term frequency, and TF-IDF. The experiment involved 12 Sundanese ancient manuscripts. From 12 manuscripts there are 474 distinct terms. The developed search engine tested by 50 random queries for three types of query. The experiment results showed that for the single query and multiple query, the best searching performance given by the combination of PL2F approach and TF-IDF weighting method. The performance has been evaluated using average time responds with value about 0.08 second and Mean Average Precision (MAP) about 0.33.
Langville, Amy N
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other cha
Fry, J.; Virkar, S.; Schroeder, R.
This chapter investigates the `winner-takes-all' hypothesis in relation to how academic researchers access online sources and resources. Some have argued that the Web provides access to a wider range of sources of information than offline resources. Others, such as Hindman et al. (2003), have shown that access to online resources is highly concentrated, particularly because of how Internet search engines are designed. With researchers increasingly using the Web and Internet search engines to disseminate and locate information and expertise, the question of whether the use of online resources enhances or diminishes the range of available sources of expertise is bound to become more pressing. To address this question four globally relevant knowledge domains were investigated using large-scale link analysis and a series of semi-structured interviews with UK-based academic researchers. We found there to be no uniform `winner-takes-all' effect in the use of online resources. Instead, there were different types of information gatekeepers for the four domains we examined and for the types of resources and sources that are sought. Particular characteristics of a knowledge domain's information environment appear to determine whether Google and other Internet search engines function as a facilitator in accessing expertise or as an influential gatekeeper.
Qu, Qiang; Liu, Siyuan; Yang, Bin
In step with the web being used widely by mobile users, user location is becoming an essential signal in services, including local intent search. Given a large set of spatial web objects consisting of a geographical location and a textual description (e.g., online business directory entries...... sets with more textually relevant objects render these studies inapplicable. We propose localitySearch, a query that returns top-k sets of spatial web objects and integrates spatial distance and textual relevance in one ranking function. We show that computing the query is NP-hard, and we present two...
Yu, Hong; Kaufman, David
The Internet is having a profound impact on physicians' medical decision making. One recent survey of 277 physicians showed that 72% of physicians regularly used the Internet to research medical information and 51% admitted that information from web sites influenced their clinical decisions. This paper describes the first cognitive evaluation of four state-of-the-art Internet search engines: Google (i.e., Google and Scholar.Google), MedQA, Onelook, and PubMed for answering definitional questions (i.e., questions with the format of "What is X?") posed by physicians. Onelook is a portal for online definitions, and MedQA is a question answering system that automatically generates short texts to answer specific biomedical questions. Our evaluation criteria include quality of answer, ease of use, time spent, and number of actions taken. Our results show that MedQA outperforms Onelook and PubMed in most of the criteria, and that MedQA surpasses Google in time spent and number of actions, two important efficiency criteria. Our results show that Google is the best system for quality of answer and ease of use. We conclude that Google is an effective search engine for medical definitions, and that MedQA exceeds the other search engines in that it provides users direct answers to their questions; while the users of the other search engines have to visit several sites before finding all of the pertinent information.
Full Text Available The goal of this study was to investigate the retrieval effectiveness of three popular German Web search services. For this purpose the engines Altavista.de, Google.de and Lycos.de were compared with each other in terms of the precision of their top 20 results. The test panelists were based on a collection of 50 randomly selected queries, and relevance assessments were made by independent jurors. Relevance assessments were acquired separately a for the search results themselves and b for the result descriptions on the search engine results pages. The basic findings were: 1. Google reached the best result values. Statistical validation showed that Google performed significantly better than Altavista, but there was no significant difference between Google and Lycos. Lycos also attained better values than Altavista, but again the differences reached no significant value. In terms of top 20 precision, the experiment showed similar outcomes to the preceding retrieval test in 2002. Google, followed by Lycos and then Altavista, still performs best, but the gaps between the engines are closer now. 2. There are big deviations between the relevance assignments based on the judgement of the results themselves and those based on the judgements of the result descriptions on the search engine results pages.
Cox, Jürgen; Neuhauser, Nadin; Michalski, Annette; Scheltema, Richard A; Olsen, Jesper V; Mann, Matthias
A key step in mass spectrometry (MS)-based proteomics is the identification of peptides in sequence databases by their fragmentation spectra. Here we describe Andromeda, a novel peptide search engine using a probabilistic scoring model. On proteome data, Andromeda performs as well as Mascot, a widely used commercial search engine, as judged by sensitivity and specificity analysis based on target decoy searches. Furthermore, it can handle data with arbitrarily high fragment mass accuracy, is able to assign and score complex patterns of post-translational modifications, such as highly phosphorylated peptides, and accommodates extremely large databases. The algorithms of Andromeda are provided. Andromeda can function independently or as an integrated search engine of the widely used MaxQuant computational proteomics platform and both are freely available at www.maxquant.org. The combination enables analysis of large data sets in a simple analysis workflow on a desktop computer. For searching individual spectra Andromeda is also accessible via a web server. We demonstrate the flexibility of the system by implementing the capability to identify cofragmented peptides, significantly improving the total number of identified peptides.
Mukhopadhyay, Debajyoti; Sharma, Manoj; Joshi, Gajanan; Pagare, Trupti; Palwe, Adarsha
Considering today's web scenario, there is a need of effective and meaningful search over the web which is provided by Semantic Web. Existing search engines are keyword based. They are vulnerable in answering intelligent queries from the user due to the dependence of their results on information available in web pages. While semantic search engines provides efficient and relevant results as the semantic web is an extension of the current web in which information is given well defined meaning....
Romaniuk, Ryszard S.
Wilga Summer 2015 Symposium on Photonics Applications and Web Engineering was held on 23-31 May. The Symposium gathered over 350 participants, mainly young researchers active in optics, optoelectronics, photonics, electronics technologies and applications. There were presented around 300 presentations in a few main topical tracks including: bio-photonics, optical sensory networks, photonics-electronics-mechatronics co-design and integration, large functional system design and maintenance, Internet of Thins, and other. The paper is an introduction the 2015 WILGA Summer Symposium Proceedings, and digests some of the Symposium chosen key presentations.
Search engines have been developed for helping learners to seek online information. Based on theory of planned behaviour approach, this research intends to investigate the behaviour of using search engines as a learning tool. After factor analysis, the results suggest that perceived satisfaction of search engine, search engines as an information…
Liu, Renyu; García, Paul S; Fleisher, Lee A
Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with psearch engines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on "anesthesia" and the proportion of papers on "anesthesia and outcome" are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns.
Shi, Conglei; Wu, Yingcai; Liu, Shixia; Zhou, Hong; Qu, Huamin
The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.
Maryam Akbari; Mozafar Cheshme Sohrabi; Ebrahim Afshar Zanjani
The present study investigated the analysis of search engines and meta search engines adoption process by University of Isfahan users during 2009-2010 based on the Rogers' diffusion of innovation theory...
Trieschnigg, Rudolf Berend; Tjin-Kam-Jet, Kien; Hiemstra, Djoerd
Building a federated search engine based on a large number existing web search engines is a challenge: implementing the programming interface (API) for each search engine is an exacting and time-consuming job. In this demonstration we present SearchResultFinder, a browser plugin which speeds up
Duarte Torres, Sergio; Weber, Ingmar
The Internet has become an important part of the daily life of children as a source of information and leisure activities. Nonetheless, given that most of the content available on the web is aimed at the general public, children are constantly exposed to inappropriate content, either because the
Maryam Akbari; Mozafar Cheshme Sohrabi; Ebrahim Afshar Zanjani
The present study investigated the analysis of search engines and meta search engines adoption process by University of Isfahan users during 2009-2010 based on the Rogers' diffusion of innovation theory. The main aim of the research was to study the rate of adoption and recognizing the potentials and effective tools in search engines and meta search engines adoption among University of Isfahan users. The research method was descriptive survey study. The cases of the study were all of the post...
Sigurbjörnsson, B.; Kamps, J.; de Rijke, M.
EuroGOV is a multilingual web corpus that was created to serve as the document collection for WebCLEF, the CLEF 2005 web retrieval task. EuroGOV is a collection of web pages crawled from the European Union portal, European Union member state governmental web sites, and Russian government web sites.
Full Text Available Using images of objects as queries is a new approach to search for information on the Web. Image-based information retrieval goes beyond only matching images, as information in other modalities also can be extracted from data collections using an image search. We have developed a new system that uses images to search for web-based information. This paper has a particular focus on exploring users' experience of general mobile image-based web searches to find what issues and phenomena it contains. This was achieved in a multipart study by creating and letting respondents test prototypes of mobile image-based search systems and collect data using interviews, observations, video observations, and questionnaires. We observed that searching for information based only on visual similarity and without any assistance is sometimes difficult, especially on mobile devices with limited interaction bandwidth. Most of our subjects preferred a search tool that guides the users through the search result based on contextual information, compared to presenting the search result as a plain ranked list.
Full Text Available In the traditional deductive approach in teaching any engineering topic, teachers would first expose students to the derivation of the equations that govern the behavior of a physical system and then demonstrate the use of equations through a limited number of textbook examples. This methodology, however, is rarely adequate to unmask the cause-effect and quantitative relationships between the system variables that the equations embody. Web-based simulation, which is the integration of simulation and internet technologies, has the potential to enhance the learning experience by offering an interactive and easily accessible platform for quick and effortless experimentation with physical phenomena.This paper presents the design and development of a web-based platform for teaching basic food engineering phenomena to food technology students. The platform contains a variety of modules (“virtual experiments” covering the topics of mass and energy balances, fluid mechanics and heat transfer. In this paper, the design and development of three modules for mass balances and heat transfer is presented. Each webpage representing an educational module has the following features: visualization of the studied phenomenon through graphs, charts or videos, computation through a mathematical model and experimentation. The student is allowed to edit key parameters of the phenomenon and observe the effect of these changes on the outputs. Experimentation can be done in a free or guided fashion with a set of prefabricated examples that students can run and self-test their knowledge by answering multiple-choice questions.
Ghanate, Avinash; Ramasamy, Sureshkumar; Suresh, C. G.
Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements. PMID:26436543
Full Text Available Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements.
Splitt, Frank G.
Evidence abounds that we are reaching the carrying capacity of the earth -- engaging in deficit spending. The amount of crops, animals, and other biomatter we extract from the earth each year exceeds wth the earth can replace by an estimated 20%. Additionally, signs of climate change are precursors of things to come. Global industrialization and the new technologies of the 20th century have helped to stretch the capacities of our finite natural system to precarious levels. Taken together, this evidence reflects a fraying web of life. Sustainable development and natural capitalism work to reverse these trends, however, we are often still wedded to the notion that environmental conservation and economic development are the 'players' in a zero-sum game. Engineering and its technological derivatives can also help remedy the problem. The well being of future generations will depend to a large extent on how we educate our future engineers. These engineers will be a new breed -- developing and using sustainable technology, benign manufacturing processes and an expanded array of environmental assessment tools that will simultaneously support and maintain healthy economies and a healthy environment. The importance of environment and sustainable development cosiderations, the need for their widespread inclusion in engineering education, the impediments to change, and the important role played by ABET will be presented.
Wolfe, Jeremy M; Evans, Karla K; Drew, Trafton; Aizenman, Avigael; Josephs, Emilie
Radiologists perform many 'visual search tasks' in which they look for one or more instances of one or more types of target item in a medical image (e.g. cancer screening). To understand and improve how radiologists do such tasks, it must be understood how the human 'search engine' works. This article briefly reviews some of the relevant work into this aspect of medical image perception. Questions include how attention and the eyes are guided in radiologic search? How is global (image-wide) information used in search? How might properties of human vision and human cognition lead to errors in radiologic search? © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com.
Yumusak, Semih; Dogdu, Erdogan; Kodaz, Halife; Kamilaris, Andreas; Vandenbussche, Pierre-Yves
In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis.
Lewandowski, Dirk; Mayr, Philipp
Purpose: To provide a critical review of Bergman’s 2001 study on the Deep Web. In addition, we bring a new concept into the discussion, the Academic Invisible Web (AIW). We define the Academic Invisible Web as consisting of all databases and collections relevant to academia but not searchable by the general-purpose internet search engines. Indexing this part of the Invisible Web is central to scientific search engines. We provide an overview of approaches followed thus far. Design/methodol...
Among the search tools currently on the Web, search engines are the most well known thanks to the popularity of major search engines such as Google and Yahoo!. While extremely successful, these major search engines do have serious limitations. This book introduces large-scale metasearch engine technology, which has the potential to overcome the limitations of the major search engines. Essentially, a metasearch engine is a search system that supports unified access to multiple existing search engines by passing the queries it receives to its component search engines and aggregating the returned
Kushniruk, Andre W; Kan, Min-Yem; McKeown, Kathleen; Klavans, Judith; Jordan, Desmond; LaFlamme, Mark; Patel, Vimia L
This paper describes the comparative evaluation of an experimental automated text summarization system, Centrifuser and three conventional search engines - Google, Yahoo and About.com. Centrifuser provides information to patients and families relevant to their questions about specific health conditions. It then produces a multidocument summary of articles retrieved by a standard search engine, tailored to the user's question. Subjects, consisting of friends or family of hospitalized patients, were asked to "think aloud" as they interacted with the four systems. The evaluation involved audio- and video recording of subject interactions with the interfaces in situ at a hospital. Results of the evaluation show that subjects found Centrifuser's summarization capability useful and easy to understand. In comparing Centrifuser to the three search engines, subjects' ratings varied; however, specific interface features were deemed useful across interfaces. We conclude with a discussion of the implications for engineering Web-based retrieval systems.
Jabaley, Craig S; Blum, James M; Groff, Robert F; O'Reilly-Shah, Vikas N
Sepsis is an established global health priority with high mortality that can be curtailed through early recognition and intervention; as such, efforts to raise awareness are potentially impactful and increasingly common. We sought to characterize trends in the awareness of sepsis by examining temporal, geographic, and other changes in search engine utilization for sepsis information-seeking online. Using time series analyses and mixed descriptive methods, we retrospectively analyzed publicly available global usage data reported by Google Trends (Google, Palo Alto, CA, USA) concerning web searches for the topic of sepsis between 24 June 2012 and 24 June 2017. Google Trends reports aggregated and de-identified usage data for its search products, including interest over time, interest by region, and details concerning the popularity of related queries where applicable. Outlying epochs of search activity were identified using autoregressive integrated moving average modeling with transfer functions. We then identified awareness campaigns and news media coverage that correlated with epochs of significantly heightened search activity. A second-order autoregressive model with transfer functions was specified following preliminary outlier analysis. Nineteen significant outlying epochs above the modeled baseline were identified in the final analysis that correlated with 14 awareness and news media events. Our model demonstrated that the baseline level of search activity increased in a nonlinear fashion. A recurrent cyclic increase in search volume beginning in 2012 was observed that correlates with World Sepsis Day. Numerous other awareness and media events were correlated with outlying epochs. The average worldwide search volume for sepsis was less than that of influenza, myocardial infarction, and stroke. Analyzing aggregate search engine utilization data has promise as a mechanism to measure the impact of awareness efforts. Heightened information-seeking about sepsis
Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe
Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
Tsyganov, A; Petit, S; Martel, P; Milenkovic, S; Suwalska, A; Delamare, C; Widegren, D; Amerigo, S Mallon; Pettersson, T [CERN, GS Department, CH-1211 Geneva 23 (Switzerland)
CERN, the European Laboratory for Particle Physics, located in Geneva - Switzerland, has recently started the Large Hadron Collider (LHC), a 27 km particle accelerator. The CERN Engineering and Equipment Data Management Service (EDMS) provides support for managing engineering and equipment information throughout the entire lifecycle of a project. Based on several both in-house developed and commercial data management systems, this service supports management and follow-up of different kinds of information throughout the lifecycle of the LHC project: design, manufacturing, installation, commissioning data, maintenance and more. The data collection phase, carried out by specialists, is now being replaced by a phase during which data will be consulted on an extensive basis by non-experts users. In order to address this change, a Web portal for the EDMS has been developed. It brings together in one space all the aspects covered by the EDMS: project and document management, asset tracking and safety follow-up. This paper presents the EDMS Web portal, its dynamic content management and its 'one click' information search engine.
Knight, Simon; Mercer, Neil
While search engines are commonly used by children to find information, and in classroom-based activities, children are not adept in their information seeking or evaluation of information sources. Prior work has explored such activities in isolated, individual contexts, failing to account for the collaborative, discourse-mediated nature of search…
The purpose of this study is to investigate students' ability to use different Internet search engines (ISEs) in Universities in the Niger Delta Region of Nigeria. To reveal the types of ISEs used, and identify the source through which students acquire the skills. The study adopted a descriptive survey method. Questionnaire and ...
Van Gysel, C.; Kanoulas, E.; de Rijke, M.; Jose, J.M.; Hauff, C.; Altıngovde, I.S.; Song, D.; Albakour, D.; Watt, S.; Tait, J.
We introduce pyndri, a Python interface to the Indri search engine. Pyndri allows to access Indri indexes from Python at two levels: (1) dictionary and tokenized document collection, (2) evaluating queries on the index. We hope that with the release of pyndri, we will stimulate reproducible, open
Moody, Mia; Bates, Elizabeth
Enough evidence is available to support the idea that public relations professionals must possess search engine optimization (SEO) skills to assist clients in a full-service capacity; however, little research exists on how much college students know about the tactic and best practices for incorporating SEO into course curriculum. Furthermore, much…
Clarke, Theresa B.; Clarke, Irvine, III
Despite an increase in ad spending and demand for employees with expertise in search engine optimization (SEO), methods for teaching this important marketing strategy have received little coverage in the literature. Using Bloom's cognitive goals hierarchy as a framework, this experiential assignment provides a process for educators who may be new…
Li, Qing; Zhang, Yujin; Dai, Shengyang
With the growth of Internet and storage capability in recent years, image has become a widespread information format in World Wide Web. However, it has become increasingly harder to search for images of interest, and effective image search engine for the WWW needs to be developed. We propose in this paper a selective filtering process and a novel approach for image classification based on feature element in the image search engine we developed for the WWW. First a selective filtering process is embedded in a general web crawler to filter out the meaningless images with GIF format. Two parameters that can be obtained easily are used in the filtering process. Our classification approach first extract feature elements from images instead of feature vectors. Compared with feature vectors, feature elements can better capture visual meanings of the image according to subjective perception of human beings. Different from traditional image classification method, our classification approach based on feature element doesn't calculate the distance between two vectors in the feature space, while trying to find associations between feature element and class attribute of the image. Experiments are presented to show the efficiency of the proposed approach.
Rasmussen, Edie M.
Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…
Liu, Renyu; García, Paul S.; Fleisher, Lee A.
Background Since current general interest in anesthesia is unknown, we analyzed internet keyword searches to gauge general interest in anesthesia in comparison with surgery and pain. Methods The trend of keyword searches from 2004 to 2010 related to anesthesia and anaesthesia was investigated using Google Insights for Search. The trend of number of peer reviewed articles on anesthesia cited on PubMed and Medline from 2004 to 2010 was investigated. The average cost on advertising on anesthesia, surgery and pain was estimated using Google AdWords. Searching results in other common search engines were also analyzed. Correlation between year and relative number of searches was determined with pengines may provide different total number of searching results (available posts), the ratios of searching results between some common keywords related to perioperative care are comparable, indicating similar trend. The peer reviewed manuscripts on “anesthesia” and the proportion of papers on “anesthesia and outcome” are trending up. Estimates for spending of advertising dollars are less for anesthesia-related terms when compared to that for pain or surgery due to relative smaller number of searching traffic. Conclusions General interest in anesthesia (anaesthesia) as measured by internet searches appears to be decreasing. Pain, preanesthesia evaluation, anesthesia and outcome and side effects of anesthesia are the critical areas that anesthesiologists should focus on to address the increasing concerns. PMID:23853739
There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself.
E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.
A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.
reported in this paper we did not compute IRF accord- ing to the Spark Jones formulation of IDF — that corresponds to Eq. 1; we instead used the...on the Apache Lucene library and on an XML parser written in Java for extracting the document fields from the sample searches and the sample documents
Full Text Available In this paper we present ASAP - Automated Search with Agents in PubMed, a web-based service aiming to manage and automate scientific literature search in the PubMed database. The system allows the creation and management of web agents, parameterized thematically and functionally, that crawl the PubMed database autonomously and periodically, aiming to search and retrieve relevant results according the requirements provided by the user. The results, containing the publications list retrieved, are emailed to the agent owner on a weekly basis, during the activity period defined for the web agent. The ASAP service is devoted to help researchers, especially from the field of biomedicine and bioinformatics, in order to increase their productivity, and can be accessed at: http://esa.ipb.pt/~agentes.
Fitzpatrick, Roberta Bronson
The Institute for Scientific Information (ISI), part of Thompson Scientific, produces the Web of Science database as part of its Web of Knowledge. Recently, ISI introduced some new features, among them a new Author Finder feature, which allows users to zero in on a specific author in a very guided way. In addition, search results may be analyzed and reports created by users, both at the click of a button. This column focuses on these recently introduced features.
Miranda, João; Gomes, Daniel
Abstract—The Web is permanently changing, with new technologies and publishing behaviors emerging everyday. It is important to track trends on the evolution of the Web to develop efficient tools to process its data. For instance, Web trends influence the design of browsers, crawlers and search engines. This study presents trends on the evolution of the Web derived from the analysis of 3 characterizations performed within an interval of 5 years. The Web portion used as a c...
Full Text Available Ecosia is an Internet search engine that plants trees with the income obtained from advertising. This study explored the factors that affect the adoption of Ecosia.org from the perspective of technology adoption and trust. This was done by using the Unified Theory of Acceptance and Use of Technology (UTAUT2 and then analyzing the results with PLS-SEM (Partial Least Squares-Structural Equation Modeling. Subsequently, a survey was conducted with a structured questionnaire on search engines, which yielded the following results: (1 the idea of a company helping to mitigate the effects of climate change by planting trees is well received by Internet users. However, few people accept the idea of changing their habits from using traditional search engines; (2 Ecosia is a search engine believed to have higher compatibility rates, and needing less hardware resources, and (3 ecological marketing is an appropriate and future strategy that can increase the intention to use a technological product. Based on the results obtained, this study shows that a search engine or other service provided by the Internet, which can be audited (visits, searches, files, etc., can also contribute to curb the effects of deforestation and climate change. In addition, companies, and especially technological start-ups, are advised to take into account that users feel better using these tools. Finally, this study urges foundations and non-governmental organizations to fight against the effects of deforestation by supporting these initiatives. The study also urges companies to support technological services, and follow the behavior of Ecosia.org in order to positively influence user satisfaction by using ecological marketing strategies.
Zhou, Yuchao; De, Suparna; Wang, Wei; Moessner, Klaus
The Web of Things aims to make physical world objects and their data accessible through standard Web technologies to enable intelligent applications and sophisticated data analytics. Due to the amount and heterogeneity of the data, it is challenging to perform data analysis directly; especially when the data is captured from a large number of distributed sources. However, the size and scope of the data can be reduced and narrowed down with search techniques, so that only the most relevant and useful data items are selected according to the application requirements. Search is fundamental to the Web of Things while challenging by nature in this context, e.g., mobility of the objects, opportunistic presence and sensing, continuous data streams with changing spatial and temporal properties, efficient indexing for historical and real time data. The research community has developed numerous techniques and methods to tackle these problems as reported by a large body of literature in the last few years. A comprehensive investigation of the current and past studies is necessary to gain a clear view of the research landscape and to identify promising future directions. This survey reviews the state-of-the-art search methods for the Web of Things, which are classified according to three different viewpoints: basic principles, data/knowledge representation, and contents being searched. Experiences and lessons learned from the existing work and some EU research projects related to Web of Things are discussed, and an outlook to the future research is presented. PMID:27128918
Internet research can be compared to trying to drink from a firehose. Such a wealth of information is available that even the simplest inquiry can sometimes generate tens of thousands of leads, more information than most people can handle, and more burdensome than most can endure. Like everyone else, NASA scientists rely on the Internet as a primary search tool. Unlike the average user, though, NASA scientists perform some pretty sophisticated, involved research. To help manage the Internet and to allow researchers at NASA to gain better, more efficient access to the wealth of information, the Agency needed a search tool that was more refined and intelligent than the typical search engine. Partnership NASA funded Stottler Henke, Inc., of San Mateo, California, a cutting-edge software company, with a Small Business Innovation Research (SBIR) contract to develop the Aware software for searching through the vast stores of knowledge quickly and efficiently. The partnership was through NASA s Ames Research Center.
Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas
Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.
Focuses on the Deep Web, defined as Web content in searchable databases of the type that can be found only by direct query. Discusses the problems of indexing; inability to find information not indexed in the search engine's database; and metasearch engines. Describes 10 sites created to access online databases or directly search them. Lists ways…
Batches of pharmaceuticals are sometimes recalled from the market when a safety issue or a defect is detected in specific production runs of a drug. Such problems are usually detected when patients or healthcare providers report abnormalities to medical authorities. Here, we test the hypothesis that defective production lots can be detected earlier by monitoring queries to Internet search engines. We extracted queries from the USA to the Bing search engine, which mentioned one of the 5195 pharmaceutical drugs during 2015 and all recall notifications issued by the Food and Drug Administration (FDA) during that year. By using attributes that quantify the change in query volume at the state level, we attempted to predict if a recall of a specific drug will be ordered by FDA in a time horizon ranging from 1 to 40 days in future. Our results show that future drug recalls can indeed be identified with an AUC of 0.791 and a lift at 5% of approximately 6 when predicting a recall occurring one day ahead. This performance degrades as prediction is made for longer periods ahead. The most indicative attributes for prediction are sudden spikes in query volume about a specific medicine in each state. Recalls of prescription drugs and those estimated to be of medium-risk are more likely to be identified using search query data. These findings suggest that aggregated Internet search engine data can be used to facilitate in early warning of faulty batches of medicines.
Full Text Available We propose a Cooperative Question Answering System that takes as input natural language queries and is able to return a cooperative answer based on semantic web resources, more specifically DBpedia represented in OWL/RDF as knowledge base and WordNet to build similar questions. Our system resorts to ontologies not only for reasoning but also to find answers and is independent of prior knowledge of the semantic resources by the user. The natural language question is translated into its semantic representation and then answered by consulting the semantics sources of information. The system is able to clarify the problems of ambiguity and helps finding the path to the correct answer. If there are multiple answers to the question posed (or to the similar questions for which DBpedia contains answers, they will be grouped according to their semantic meaning, providing a more cooperative and clarified answer to the user.
Souer, J.; van de Weerd, I.|info:eu-repo/dai/nl/304836664; Versendaal, J.M.|info:eu-repo/dai/nl/07506104X; Brinkkemper, S.|info:eu-repo/dai/nl/07500707X
Web applications are evolving towards strong content-centered Web applications. The development processes and implementation of these applications are unlike the development and implementation of traditional information systems. In this paper we propose WebEngineering Method; a method for developing
For the last three decades, the engineering higher education and professional environments have been completely transformed by the "electronic/digital information revolution" that has included the introduction of personal computer, the development of email and world wide web, and broadband Internet connections at home. Herein the writer compares…
Luna Ramírez,Enrique; Ambriz Delgadillo,Humberto; Nungaray Ornelas,J. Antonio; Álvarez Rodríguez, Francisco Javier; Jorge N Mondragón Reyes
In this paper, the design of a Web metadata search model with semi-intelligent features is proposed. The search model is oriented to retrieve the metadata associated to a data warehouse in a fast, flexible and reliable way. Our proposal includes a set of distinctive functionalities, which consist of the temporary storage of the frequently used metadata in an exclusive store, different to the global data warehouse metadata store, and of the use of control processes to retrieve information from...
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search. PMID:15215446
Park, Young M.; Squizzato, Silvano; Buso, Nicola; Gur, Tamer
Abstract We present an update of the EBI Search engine, an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. The interconnectivity that exists between data resources at EMBL–EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types that include nucleotide and protein sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, as well as the life science literature. EBI Search provides a powerful RESTful API that enables its integration into third-party portals, thus providing ‘Search as a Service’ capabilities, which are the main topic of this article. PMID:28472374
Park, Young M; Squizzato, Silvano; Buso, Nicola; Gur, Tamer; Lopez, Rodrigo
We present an update of the EBI Search engine, an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types that include nucleotide and protein sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, as well as the life science literature. EBI Search provides a powerful RESTful API that enables its integration into third-party portals, thus providing 'Search as a Service' capabilities, which are the main topic of this article. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
This chapter outlines how search engine technology can be used in online public access library catalogs (OPACs) to help improve users’ experiences, to identify users’ intentions, and to indicate how it can be applied in the library context, along with how sophisticated ranking criteria can be applied to the online library catalog. A review of the literature and current OPAC developments form the basis of recommendations on how to improve OPACs. Findings were that the major shor...
This paper investigates the usefulness of Web portals in a workbench for assisting student interpreters in the search for and collection of vocabulary. The experiment involved a class of fifteen English as a Foreign Language (EFL) student interpreters, who were required to equip themselves with the appropriate English vocabulary to handle an…
Millennial students make up a large portion of undergraduate students attending colleges and universities, and they have a variety of online resources available to them to complete academically related information searches, primarily Web based and library-based online information retrieval systems. The content, ease of use, and required search…
In recent years, tags have become a standard feature on a diverse range of sites on the Web, accompanying blog posts, photos, videos, and online news stories. Tags are descriptive terms attached to Internet resources. Despite the rapid adoption of tagging, how people use tags during the search process is not well understood. There is little…
Chuklin, A.; Markov, I.; de Rijke, M.
In this tutorial we give an overview of click models for web search. We show how the framework of probabilistic graphical models helps to explain user behavior, build new evaluation metrics and perform simulations. The tutorial discusses foundational aspects alongside experimental details and
Baker, Peter R; Trinidad, Jonathan C; Chalkley, Robert J
Large proteomic data sets identifying hundreds or thousands of modified peptides are becoming increasingly common in the literature. Several methods for assessing the reliability of peptide identifications both at the individual peptide or data set level have become established. However, tools for measuring the confidence of modification site assignments are sparse and are not often employed. A few tools for estimating phosphorylation site assignment reliabilities have been developed, but these are not integral to a search engine, so require a particular search engine output for a second step of processing. They may also require use of a particular fragmentation method and are mostly only applicable for phosphorylation analysis, rather than post-translational modifications analysis in general. In this study, we present the performance of site assignment scoring that is directly integrated into the search engine Protein Prospector, which allows site assignment reliability to be automatically reported for all modifications present in an identified peptide. It clearly indicates when a site assignment is ambiguous (and if so, between which residues), and reports an assignment score that can be translated into a reliability measure for individual site assignments.
Moat, Helen Susannah; Olivola, Christopher Y; Chater, Nick; Preis, Tobias
When making a decision, humans consider two types of information: information they have acquired through their prior experience of the world, and further information they gather to support the decision in question. Here, we present evidence that data from search engines such as Google can help us model both sources of information. We show that statistics from search engines on the frequency of content on the Internet can help us estimate the statistical structure of prior experience; and, specifically, we outline how such statistics can inform psychological theories concerning the valuation of human lives, or choices involving delayed outcomes. Turning to information gathering, we show that search query data might help measure human information gathering, and it may predict subsequent decisions. Such data enable us to compare information gathered across nations, where analyses suggest, for example, a greater focus on the future in countries with a higher per capita GDP. We conclude that search engine data constitute a valuable new resource for cognitive scientists, offering a fascinating new tool for understanding the human decision-making process. Copyright © 2016 The Authors. Topics in Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
Kao, Chia-Pin; Chien, Hui-Min
This study was conducted to explore the relationships between pre-school educators' conceptions of and approaches to learning by web-searching through Internet Self-efficacy. Based on data from 242 pre-school educators who had prior experience of participating in web-searching in Taiwan for path analyses, it was found in this study that…
Full Text Available Occupation is key in socioeconomic research. As in other survey modes, most web surveys use an open-ended question for occupation, though the absence of interviewers elicits unidentifiable or aggregated responses. Unlike other modes, web surveys can use a search tree with an occupation database. They are hardly ever used, but this may change due to technical advancements. This article evaluates a three-step search tree with 1,700 occupational titles, used in the 2010 multilingual WageIndicator web survey for UK, Belgium and Netherlands (22,990 observations. Dropout rates are high; in Step 1 due to unemployed respondents judging the question not to be adequate, and in Step 3 due to search tree item length. Median response times are substantial due to search tree item length, dropout in the next step and invalid occupations ticked. Overall the validity of the occupation data is rather good, 1.7-7.5% of the respondents completing the search tree have ticked an invalid occupation.
Full Text Available To surface the Deep Web, one crucial task is to predict whether a given web page has a search interface (searchable HyperText Markup Language (HTML form or not. Previous studies have focused on supervised classification with labeled examples. However, labeled data are scarce, hard to get and requires tediousmanual work, while unlabeled HTML forms are abundant and easy to obtain. In this research, we consider the plausibility of using both labeled and unlabeled data to train better models to identify search interfaces more effectively. We present a semi-supervised co-training ensemble learning approach using both neural networks and decision trees to deal with the search interface identification problem. We show that the proposed model outperforms previous methods using only labeled data. We also show that adding unlabeled data improves the effectiveness of the proposed model.
Duan, Lixin; Li, Wen; Tsang, Ivor Wai-Hung; Xu, Dong
Given a textual query in traditional text-based image retrieval (TBIR), relevant images are to be reranked using visual features after the initial text-based search. In this paper, we propose a new bag-based reranking framework for large-scale TBIR. Specifically, we first cluster relevant images using both textual and visual features. By treating each cluster as a "bag" and the images in the bag as "instances," we formulate this problem as a multi-instance (MI) learning problem. MI learning methods such as mi-SVM can be readily incorporated into our bag-based reranking framework. Observing that at least a certain portion of a positive bag is of positive instances while a negative bag might also contain positive instances, we further use a more suitable generalized MI (GMI) setting for this application. To address the ambiguities on the instance labels in the positive and negative bags under this GMI setting, we develop a new method referred to as GMI-SVM to enhance retrieval performance by propagating the labels from the bag level to the instance level. To acquire bag annotations for (G)MI learning, we propose a bag ranking method to rank all the bags according to the defined bag ranking score. The top ranked bags are used as pseudopositive training bags, while pseudonegative training bags can be obtained by randomly sampling a few irrelevant images that are not associated with the textual query. Comprehensive experiments on the challenging real-world data set NUS-WIDE demonstrate our framework with automatic bag annotation can achieve the best performances compared with existing image reranking methods. Our experiments also demonstrate that GMI-SVM can achieve better performances when using the manually labeled training bags obtained from relevance feedback.
Full Text Available With the advance of the World-Wide Web (WWW technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resources of their interests efficiently. While the WWW has Web search engines addressing web resource discovery issues, we envision that the geospatial Web (i.e., GeoWeb also requires GeoWeb search engines. To realize a GeoWeb search engine, one of the first steps is to proactively discover GeoWeb resources on the WWW. Hence, in this study, we propose the GeoWeb Crawler, an extensible Web crawling framework that can find various types of GeoWeb resources, such as Open Geospatial Consortium (OGC web services, Keyhole Markup Language (KML and Environmental Systems Research Institute, Inc (ESRI Shapefiles. In addition, we apply the distributed computing concept to promote the performance of the GeoWeb Crawler. The result shows that for 10 targeted resources types, the GeoWeb Crawler discovered 7351 geospatial services and 194,003 datasets. As a result, the proposed GeoWeb Crawler framework is proven to be extensible and scalable to provide a comprehensive index of GeoWeb.
Full Text Available Recently, a new technology called WebGL shows a lot of potentials for developing games. However since this technology is still new, there are still many potentials in the game development area that are not explored yet. This paper tries to uncover the potential of integrating physics frameworks with WebGL technology in a game engine for developing 2D or 3D games. Specifically we integrated three open source physics frameworks: Bullet, Cannon, and JigLib into a WebGL-based game engine. Using experiment, we assessed these frameworks in terms of their correctness or accuracy, performance, completeness and compatibility. The results show that it is possible to integrate open source physics frameworks into a WebGLbased game engine, and Bullet is the best physics framework to be integrated into the WebGL-based game engine.
Yogya, Resa; Kosala, Raymond
Recently, a new technology called WebGL shows a lot of potentials for developing games. However since this technology is still new, there are still many potentials in the game development area that are not explored yet. This paper tries to uncover the potential of integrating physics frameworks with WebGL technology in a game engine for developing 2D or 3D games. Specifically we integrated three open source physics frameworks: Bullet, Cannon, and JigLib into a WebGL-based game engine. Using experiment, we assessed these frameworks in terms of their correctness or accuracy, performance, completeness and compatibility. The results show that it is possible to integrate open source physics frameworks into a WebGLbased game engine, and Bullet is the best physics framework to be integrated into the WebGL-based game engine.
Weerd, I. van de; Souer, J.; Versendaal, J.M.; Brinkkemper, S.
The development of complex, data-intensive web applications is becoming simpler due to the usage of content management systems. Conventional information systems development methods as well as web application development methods do not cover the needs of a method for web content management
James D. Carswell
Full Text Available Developing mobile geo-information systems for sensor web applications involves technologies that can access linked geographical and semantically related Internet information. Additionally, in tomorrow’s Web 4.0 world, it is envisioned that trillions of inexpensive micro-sensors placed throughout the environment will also become available for discovery based on their unique geo-referenced IP address. Exploring these enormous volumes of disparate heterogeneous data on today’s location and orientation aware smartphones requires context-aware smart applications and services that can deal with “information overload”. 3DQ (Three Dimensional Query is our novel mobile spatial interaction (MSI prototype that acts as a next-generation base for human interaction within such geospatial sensor web environments/urban landscapes. It filters information using “Hidden Query Removal” functionality that intelligently refines the search space by calculating the geometry of a three dimensional visibility shape (Vista space at a user’s current location. This 3D shape then becomes the query “window” in a spatial database for retrieving information on only those objects visible within a user’s actual 3D field-of-view. 3DQ reduces information overload and serves to heighten situation awareness on constrained commercial off-the-shelf devices by providing visibility space searching as a mobile web service. The effects of variations in mobile spatial search techniques in terms of query speed vs. accuracy are evaluated and presented in this paper.
Leonimer Flávio de Melo
Full Text Available This work subject focuses in distance learning (DL modality by the Internet. The use of calculators and simulators software introduces a high level of interactivity in DL systems, such as Matlab software proposed by using in this work. The use of efficient mathematical packages and hypermedia technologies opens the door to a new paradigm of teaching and learning in the dawn of this new millennium. The use of hypertext, graphics, animation, audio, video, efficient calculators and simulators incorporating artificial intelligence techniques and the advance in the broadband networks will pave the way to this new horizon. The contribution of this work, besides the Matlab Web integration, is the developing of an introductory course in traffic engineering in hypertext format. Also, calculators to the most employed expressions of traffic analysis were developed to the Matlab server environment. By the use of the telephone traffic calculator, the user inputs data on his or her Internet browser and the systems returns numerical data, graphics and tables in HTML pages. The system is also very useful for Professional traffic calculations, replacing with advantages the use of the traditional methods by means of static tables and graphics in paper format.
Torrecilla Salinas, Carlos Joaquín
En los últimos años la importancia de la Web y los desarrollos Web en la vida de ciudadanos y empresas ha aumentado significativamente. Desde sus inicios, la comunidad investigadora intentó definir aproximaciones metodológicas para los entornos Web. De ese modo, mientras que los sistemas Web eran desarrollados inicialmente sin seguir una aproximación estructurada que pudiera garantizar resultados de calidad, aparecieron diferentes propuestas que ayudaron a establecer la Ingeniería Web como un...
This article discusses the co-production of search technology and a European identity in the context of the EU data protection reform. The negotiations of the EU data protection legislation ran from 2012 until 2015 and resulted in a unified data protection legislation directly binding for all European member states. I employ a discourse analysis to examine EU policy documents and Austrian media materials related to the reform process. Using the concept 'sociotechnical imaginary', I show how a European imaginary of search engines is forming in the EU policy domain, how a European identity is constructed in the envisioned politics of control, and how national specificities contribute to the making and unmaking of a European identity. I discuss the roles that national technopolitical identities play in shaping both search technology and Europe, taking as an example Austria, a small country with a long history in data protection and a tradition of restrained technology politics.
Jiang, Silis Y.; Weng, Chunhua
Clinical trials are fundamental to the advancement of medicine but constantly face recruitment difficulties. Various clinical trial search engines have been designed to help health consumers identify trials for which they may be eligible. Unfortunately, knowledge of the usefulness and usability of their designs remains scarce. In this study, we used mixed methods, including time-motion analysis, think-aloud protocol, and survey, to evaluate five popular clinical trial search engines with 11 users. Differences in user preferences and time spent on each system were observed and correlated with user characteristics. In general, searching for applicable trials using these systems is a cognitively demanding task. Our results show that user perceptions of these systems are multifactorial. The survey indicated eTACTS being the generally preferred system, but this finding did not persist among all mixed methods. This study confirms the value of mixed-methods for a comprehensive system evaluation. Future system designers must be aware that different users groups expect different functionalities. PMID:25954590
Jiang, Silis Y; Weng, Chunhua
Clinical trials are fundamental to the advancement of medicine but constantly face recruitment difficulties. Various clinical trial search engines have been designed to help health consumers identify trials for which they may be eligible. Unfortunately, knowledge of the usefulness and usability of their designs remains scarce. In this study, we used mixed methods, including time-motion analysis, think-aloud protocol, and survey, to evaluate five popular clinical trial search engines with 11 users. Differences in user preferences and time spent on each system were observed and correlated with user characteristics. In general, searching for applicable trials using these systems is a cognitively demanding task. Our results show that user perceptions of these systems are multifactorial. The survey indicated eTACTS being the generally preferred system, but this finding did not persist among all mixed methods. This study confirms the value of mixed-methods for a comprehensive system evaluation. Future system designers must be aware that different users groups expect different functionalities.
Chuklin, A.; Schuth, A.; Hofmann, K.; Serdyukov, P.; de Rijke, M.
A result page of a modern web search engine is often much more complicated than a simple list of "ten blue links." In particular, a search engine may combine results from different sources (e.g., Web, News, and Images), and display these as grouped results to provide a better user experience. Such a
Full Text Available The present study investigated the analysis of search engines and meta search engines adoption process by University of Isfahan users during 2009-2010 based on the Rogers' diffusion of innovation theory. The main aim of the research was to study the rate of adoption and recognizing the potentials and effective tools in search engines and meta search engines adoption among University of Isfahan users. The research method was descriptive survey study. The cases of the study were all of the post graduate students of the University of Isfahan. 351 students were selected as the sample and categorized by a stratified random sampling method. Questionnaire was used for collecting data. The collected data was analyzed using SPSS 16 in both descriptive and analytic statistic. For descriptive statistic frequency, percentage and mean were used, while for analytic statistic t-test and Kruskal-Wallis non parametric test (H-test were used. The finding of t-test and Kruscal-Wallis indicated that the mean of search engines and meta search engines adoption did not show statistical differences gender, level of education and the faculty. Special search engines adoption process was different in terms of gender but not in terms of the level of education and the faculty. Other results of the research indicated that among general search engines, Google had the most adoption rate. In addition, among the special search engines, Google Scholar and among the meta search engines Mamma had the most adopting rate. Findings also showed that friends played an important role on how students adopted general search engines while professors had important role on how students adopted special search engines and meta search engines. Moreover, results showed that the place where students got the most acquaintance with search engines and meta search engines was in the university. The finding showed that the curve of adoption rate was not normal and it was not also in S-shape. Morover
An online questionnaire “Semantic web searches for geoscience resources” was completed by 35 staff of British Geological Survey (BGS) between 28th July 2015 and 28th August 2015. The questionnaire was designed to better understand current web search habits, preferences, and the reception of semantic search features in order to inform PhD research into the use of domain ontologies for semantic information retrieval. The key findings were that relevance ranking is important in fo...
Drias, Yassine; Kechid, Samir; Pasi, Gabriella
We present in this paper a novel approach based on multi-agent technology for Web information foraging. We proposed for this purpose an architecture in which we distinguish two important phases. The first one is a learning process for localizing the most relevant pages that might interest the user. This is performed on a fixed instance of the Web. The second takes into account the openness and dynamicity of the Web. It consists on an incremental learning starting from the result of the first phase and reshaping the outcomes taking into account the changes that undergoes the Web. The system was implemented using a colony of artificial ants hybridized with tabu search in order to achieve more effectiveness and efficiency. To validate our proposal, experiments were conducted on MedlinePlus, a real website dedicated for research in the domain of Health in contrast to other previous works where experiments were performed on web logs datasets. The main results are promising either for those related to strong Web regularities and for the response time, which is very short and hence complies the real time constraint.
Velázquez Santana Eugenio César
Full Text Available The application of new information technologies such as Google Web Toolkit and App Engine are making a difference in the academic management of Higher Education Institutions (IES, who seek to streamline their processes as well as reduce infrastructure costs. However, they encounter the problems with regard to acquisition costs, the infrastructure necessary for their use, as well as the maintenance of the software; It is for this reason that the present research aims to describe the application of these new technologies in HEIs, as well as to identify their advantages and disadvantages and the key success factors in their implementation. As a software development methodology, SCRUM was used as well as PMBOK as a project management tool. The main results were related to the application of these technologies in the development of customized software for teachers, students and administrators, as well as the weaknesses and strengths of using them in the cloud. On the other hand, it was also possible to describe the paradigm shift that data warehouses are generating with respect to today's relational databases.
Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Gur, Tamer; Cowley, Andrew; Li, Weizhong; Uludag, Mahmut; Pundir, Sangya; Cham, Jennifer A; McWilliam, Hamish; Lopez, Rodrigo
The European Bioinformatics Institute (EMBL-EBI-https://www.ebi.ac.uk) provides free and unrestricted access to data across all major areas of biology and biomedicine. Searching and extracting knowledge across these domains requires a fast and scalable solution that addresses the requirements of domain experts as well as casual users. We present the EBI Search engine, referred to here as 'EBI Search', an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. API integration provides access to analytical tools, allowing users to further investigate the results of their search. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types including sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, together with relevant life science literature. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Woo, Hyekyung; Cho, Youngtae; Shim, Eunyoung; Lee, Jong-Koo; Lee, Chang-Gun; Kim, Seong Hwan
As suggested as early as in 2006, logs of queries submitted to search engines seeking information could be a source for detection of emerging influenza epidemics if changes in the volume of search queries are monitored (infodemiology). However, selecting queries that are most likely to be associated with influenza epidemics is a particular challenge when it comes to generating better predictions. In this study, we describe a methodological extension for detecting influenza outbreaks using search query data; we provide a new approach for query selection through the exploration of contextual information gleaned from social media data. Additionally, we evaluate whether it is possible to use these queries for monitoring and predicting influenza epidemics in South Korea. Our study was based on freely available weekly influenza incidence data and query data originating from the search engine on the Korean website Daum between April 3, 2011 and April 5, 2014. To select queries related to influenza epidemics, several approaches were applied: (1) exploring influenza-related words in social media data, (2) identifying the chief concerns related to influenza, and (3) using Web query recommendations. Optimal feature selection by least absolute shrinkage and selection operator (Lasso) and support vector machine for regression (SVR) were used to construct a model predicting influenza epidemics. In total, 146 queries related to influenza were generated through our initial query selection approach. A considerable proportion of optimal features for final models were derived from queries with reference to the social media data. The SVR model performed well: the prediction values were highly correlated with the recent observed influenza-like illness (r=.956; Psocial media data seems ideal for supporting influenza surveillance based on search query data.
This is the first interdisciplinary exploration of the philosophical foundations of the Web, a new area of inquiry that has important implications across a range of domains. Contains twelve essays that bridge the fields of philosophy, cognitive science, and phenomenologyTackles questions such as the impact of Google on intelligence and epistemology, the philosophical status of digital objects, ethics on the Web, semantic and ontological changes caused by the Web, and the potential of the Web to serve as a genuine cognitive extensionBrings together insightful new scholarship from well-known an
Menachemi, Nir; Rahurkar, Saurabh; Rahurkar, Mandar
Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public's reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to "guns" (+50.06%), "shooting incident" (+333.71%), "ammunition" (+155.14%), and "gun-related laws" (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following "shooting incident" queries whereas searches for "guns" (+61.02%) and "ammunition" (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development.
Background Internet search is the most common activity on the World Wide Web and generates a vast amount of user-reported data regarding their information-seeking preferences and behavior. Although this data has been successfully used to examine outbreaks, health care utilization, and outcomes related to quality of care, its value in informing public health policy remains unclear. Objective The aim of this study was to evaluate the role of Internet search query data in health policy development. To do so, we studied the public’s reaction to a major societal event in the context of the 2012 Sandy Hook School shooting incident. Methods Query data from the Yahoo! search engine regarding firearm-related searches was analyzed to examine changes in user-selected search terms and subsequent websites visited for a period of 14 days before and after the shooting incident. Results A total of 5,653,588 firearm-related search queries were analyzed. In the after period, queries increased for search terms related to “guns” (+50.06%), “shooting incident” (+333.71%), “ammunition” (+155.14%), and “gun-related laws” (+535.47%). The highest increase (+1054.37%) in Web traffic was seen by news websites following “shooting incident” queries whereas searches for “guns” (+61.02%) and “ammunition” (+173.15%) resulted in notable increases in visits to retail websites. Firearm-related queries generally returned to baseline levels after approximately 10 days. Conclusions Search engine queries present a viable infodemiology metric on public reactions and subsequent behaviors to major societal events and could be used by policymakers to inform policy development. PMID:28336508
Pavelka, Antonin; Chovancova, Eva; Damborsky, Jiri
HotSpot Wizard is a web server for automatic identification of 'hot spots' for engineering of substrate specificity, activity or enantioselectivity of enzymes and for annotation of protein structures...
Lin, Risheng; Afjeh, Abdollah A.
This paper discusses the detailed design of an XML databinding framework for aircraft engine simulation. The framework provides an object interface to access and use engine data. while at the same time preserving the meaning of the original data. The Language independent representation of engine component data enables users to move around XML data using HTTP through disparate networks. The application of this framework is demonstrated via a web-based turbofan propulsion system simulation using the World Wide Web (WWW). A Java Servlet based web component architecture is used for rendering XML engine data into HTML format and dealing with input events from the user, which allows users to interact with simulation data from a web browser. The simulation data can also be saved to a local disk for archiving or to restart the simulation at a later time.
Arora, V S; Stuckler, D; McKee, M
First, to determine if a cyclical trend is observed for search activity of suicide and three common suicide risk factors in the United Kingdom: depression, unemployment, and marital strain. Second, to test the validity of suicide search data as a potential marker of suicide risk by evaluating whether web searches for suicide associate with suicide rates among those of different ages and genders in the United Kingdom. Cross-sectional. Search engine data was obtained from Google Trends, a publicly available repository of information of trends and patterns of user searches on Google. The following phrases were entered into Google Trends to analyse relative search volume for suicide, depression, job loss, and divorce, respectively: 'suicide'; 'depression + depressed + hopeless'; 'unemployed + lost job'; 'divorce'. Spearman's rank correlation coefficient was employed to test bivariate associations between suicide search activity and official suicide rates from the Office of National Statistics (ONS). Cyclical trends were observed in search activity for suicide and depression-related search activity, with peaks in autumn and winter months, and a trough in summer months. A positive, non-significant association was found between suicide-related search activity and suicide rates in the general working-age population (15-64 years) (ρ = 0.164; P = 0.652). This association is stronger in younger age groups, particularly for those 25-34 years of age (ρ = 0.848; P = 0.002). We give credence to a link between search activity for suicide and suicide rates in the United Kingdom from 2004 to 2013 for high risk sub-populations (i.e. male youth and young professionals). There remains a need for further research on how Google Trends can be used in other areas of disease surveillance and for work to provide greater geographical precision, as well as research on ways of mitigating the risk of internet use leading to suicide ideation in youth. Copyright © 2015 The Royal
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Siążnik, Artur
Due to the growing number of biomedical entries in data repositories of the National Center for Biotechnology Information (NCBI), it is difficult to collect, manage and process all of these entries in one place by third-party software developers without significant investment in hardware and software infrastructure, its maintenance and administration. Web services allow development of software applications that integrate in one place the functionality and processing logic of distributed software components, without integrating the components themselves and without integrating the resources to which they have access. This is achieved by appropriate orchestration or choreography of available Web services and their shared functions. After the successful application of Web services in the business sector, this technology can now be used to build composite software tools that are oriented towards biomedical data processing. We have developed a new tool for efficient and dynamic data exploration in GenBank and other NCBI databases. A dedicated search GenBank system makes use of NCBI Web services and a package of Entrez Programming Utilities (eUtils) in order to provide extended searching capabilities in NCBI data repositories. In search GenBank users can use one of the three exploration paths: simple data searching based on the specified user's query, advanced data searching based on the specified user's query, and advanced data exploration with the use of macros. search GenBank orchestrates calls of particular tools available through the NCBI Web service providing requested functionality, while users interactively browse selected records in search GenBank and traverse between NCBI databases using available links. On the other hand, by building macros in the advanced data exploration mode, users create choreographies of eUtils calls, which can lead to the automatic discovery of related data in the specified databases. search GenBank extends standard capabilities of the
Lang, Dustin; Hogg, David W.
We performed an image search for "Comet Holmes," using the Yahoo! Web search engine, on 2010 April 1. Thousands of images were returned. We astrometrically calibrated—and therefore vetted—the images using the Astrometry.net system. The calibrated image pointings form a set of data points to which we can fit a test-particle orbit in the solar system, marginalizing over image dates and detecting outliers. The approach is Bayesian and the model is, in essence, a model of how comet astrophotographers point their instruments. In this work, we do not measure the position of the comet within each image, but rather use the celestial position of the whole image to infer the orbit. We find very strong probabilistic constraints on the orbit, although slightly off the Jet Propulsion Lab ephemeris, probably due to limitations of our model. Hyperparameters of the model constrain the reliability of date meta-data and where in the image astrophotographers place the comet; we find that ~70% of the meta-data are correct and that the comet typically appears in the central third of the image footprint. This project demonstrates that discoveries and measurements can be made using data of extreme heterogeneity and unknown provenance. As the size and diversity of astronomical data sets continues to grow, approaches like ours will become more essential. This project also demonstrates that the Web is an enormous repository of astronomical information, and that if an object has been given a name and photographed thousands of times by observers who post their images on the Web, we can (re-)discover it and infer its dynamical properties.
Pando Cerra, Pablo; Suárez González, Jesús M.; Busto Parra, Bernardo; Rodríguez Ortiz, Diana; Álvarez Peñín, Pedro I.
Many current Web-based learning environments facilitate the theoretical teaching of a subject but this may not be sufficient for those disciplines that require a significant use of graphic mechanisms to resolve problems. This research study looks at the use of an environment that can help students learn engineering drawing with Web-based CAD…
What if you could see your experiment’s results in a “page rank” style? How would your workflow change if you could collaborate with your colleagues on a single platform? What if you could search all your event data for certain specifications? All of these ideas (and more) are being explored at the LHCb experiment in collaboration with Internet giant Yandex. An extremely rare B0s → μμ decay candidate event observed in the LHCb detector. As the leading search provider in Russia, with over 60% of the market share, Yandex is to East what Google is to West. Their collaboration with CERN began back in 2011, when Yandex co-founder Ilya Segalovich was approached by then-LHCb spokesperson Andrei Golutvin. “Just as Yandex's search engines sift through thousands of websites to find the right page, our experimentalists apply algorithms to find the best result in our data," says Andrei Golutvin. "Perhaps the techn...
Rajman, M; Boynton, I M; Fridlund, B; Fyhrlund, A; Sundgren, B; Lundquist, P; Thelander, H; Wänerskär, M
In this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navigation techniques exploiting the hierarchical structuring of the available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical information. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distributions over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several appro...
D. Janssen (David); H.W.G.M. van Heck (Eric)
textabstractIn online affiliate marketing networks advertising web sites offer their affiliates revenues based on provided web site traffic and associated leads and sales. Advertising web sites can have a network of thousands of affiliates providing them with web site traffic through hyperlinks on
Wang, Liupu; Wang, Juexin; Wang, Michael; Li, Yong; Liang, Yanchun; Xu, Dong
The Internet has become one of the most important means to obtain health and medical information. It is often the first step in checking for basic information about a disease and its treatment. The search results are often useful to general users. Various search engines such as Google, Yahoo!, Bing, and Ask.com can play an important role in obtaining medical information for both medical professionals and lay people. However, the usability and effectiveness of various search engines for medical information have not been comprehensively compared and evaluated. To compare major Internet search engines in their usability of obtaining medical and health information. We applied usability testing as a software engineering technique and a standard industry practice to compare the four major search engines (Google, Yahoo!, Bing, and Ask.com) in obtaining health and medical information. For this purpose, we searched the keyword breast cancer in Google, Yahoo!, Bing, and Ask.com and saved the results of the top 200 links from each search engine. We combined nonredundant links from the four search engines and gave them to volunteer users in an alphabetical order. The volunteer users evaluated the websites and scored each website from 0 to 10 (lowest to highest) based on the usefulness of the content relevant to breast cancer. A medical expert identified six well-known websites related to breast cancer in advance as standards. We also used five keywords associated with breast cancer defined in the latest release of Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and analyzed their occurrence in the websites. Each search engine provided rich information related to breast cancer in the search results. All six standard websites were among the top 30 in search results of all four search engines. Google had the best search validity (in terms of whether a website could be opened), followed by Bing, Ask.com, and Yahoo!. The search results highly overlapped between the
This article argues that search engines, algorithms, and databases can be considered as a way of understanding deep mediatization (Couldry & Hepp, 2016). They are embedded in a variety of social and cultural practices and as such they change our communicative actions to be shaped by their logic...... reviewed recent trends in mediatization research, the argument is discussed and unfolded in-between the material and social constructivist-phenomenological interpretations of mediatization. In conclusion, it is discussed how deep this form of mediatization can be taken to be....
Jones, Andrew R; Siepen, Jennifer A; Hubbard, Simon J; Paton, Norman W
LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.
Wissinger, John; Ristroph, Robert; Diemunsch, Joseph R.; Severson, William E.; Fruedenthal, Eric
The DARPA/AFRL 'Moving and Stationary Target Acquisition and Recognition' (MSTAR) program is developing a model-based vision approach to Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). The motivation for this work is to develop a high performance ATR capability that can identify ground targets in highly unconstrained imaging scenarios that include variable image acquisition geometry, arbitrary target pose and configuration state, differences in target deployment situation, and strong intra-class variations. The MSTAR approach utilizes radar scattering models in an on-line hypothesize-and-test operation that compares predicted target signature statistics with features extracted from image data in an attempt to determine a 'best fit' explanation of the observed image. Central to this processing paradigm is the Search algorithm, which provides intelligent control in selecting features to measure and hypotheses to test, as well as in making the decision about when to stop processing and report a specific target type or clutter. Intelligent management of computation performed by the Search module is a key enabler to scaling the model-based approach to the large hypothesis spaces typical of realistic ATR problems. In this paper, we describe the present state of design and implementation of the MSTAR Search engine, as it has matured over the last three years of the MSTAR program. The evolution has been driven by a continually expanding problem domain that now includes 30 target types, viewed under arbitrary squint/depression, with articulations, reconfigurations, revetments, variable background, and up to 30% blocking occlusion. We believe that the research directions that have been inspired by MSTAR's challenging problem domain are leading to broadly applicable search methodologies that are relevant to computer vision systems in many areas.