Bringing the World’s Science to all Corners of the Globe
Walter L. Warnick, Ph.D., Director
United States Department of Energy
Office of Scientific & Technical Information
(Operating Agent for WorldWideScience.org)
Washington, D.C., United States
WorldWideScience.org, a global science gateway, explores new trends in global scientific communication, including how technology opens borders by providing access both to sources and to users located in diverse geographic settings. International collaboration is a key component in WorldWideScience.org’s success in providing scientific information to a global population. WorldWideScience.org was designed to accelerate scientific discovery and progress by accelerating the sharing of scientific knowledge. Through a multilateral partnership, WorldWideScience.org enables anyone with internet access to launch a single-query search of over 40 national scientific databases and portals in many countries. From a user’s perspective, WorldWideScience.org makes the databases act as if they were a unified whole. The revolutionary technology behind WorldWideScience.org will be discussed.
If the sharing of knowledge is accelerated, discovery is accelerated
Profound implications for all of us in the information business!
WorldWideScience.org is a global science gateway designed to accelerate scientific discovery and progress by accelerating the sharing of scientific knowledge. Every scientist, regardless of discipline, will agree that science progresses only if knowledge is shared among colleagues.
A key piece of science discovery
Thus, if the sharing of knowledge is accelerated, discovery is accelerated. Discovery powers the growth of world prosperity, discovery improves people’s lives, and discovery accelerates science.
The Spread of Knowledge about Feynman Diagrams
Discovery path of US and UK authors
From: The Power of a Good Idea: quantitative modeling of the spread of ideas from epidemiological models, Luis M. A. Bettencourt, Ariel Cintron-Arias, Carlos Castillo-Chavez; David Kaiser, May 2005
The spread of knowledge can actually be measured. Studies have indicated that knowledge spreads in much the same way as diseases. Of course, the spread of knowledge is a more desirable and appealing phenomena to study. Epidemiological diffusion models created to track the spread of disease can also be applied to track the spread of knowledge.
Knowledge is contagious.
Increasing the contact rate means researchers “catch” an idea faster.
A key parameter of the model is the “contact rate.” By increasing the contact rate, the spread of knowledge is accelerated. To that end, products like WorldWideScience.org help researchers “catch” ideas faster.
In January 2007, Dr. Raymond Orbach, DOE Under Secretary for Science, and Lynne Brindley, Chief Executive of the British Library, signed a Statement of Intent to partner in the development of a searchable global science gateway.
First envisioned by a member of the International Council for Scientific and Technical Information (ICSTI) at the June 2006 meeting in Bethesda, Maryland, U.S., the concept was strongly endorsed by the British Library, which offered to collaborate with the U.S. Department of Energy (DOE). A bilateral partnership was formalized in January 2007 when the British Library and DOE signed a Statement of Intent to partner in the development of a Global Science Gateway. Dr. Raymond Orbach, DOE Under Secretary for Science, and Lynne Brindley, Chief Executive of the British Library, participated in the signing ceremony at the British Library. The Statement of Intent invited other nations to join the collaboration.
Now searches over 40 portals from more than 50 countries
Later officially named “WorldWideScience.org,” the gateway was developed by the U.S. Department of Energy’s Office of Scientific and Technical Information (OSTI) and was unveiled to ICSTI members and the public at the June 2007 ICSTI meeting in Nancy, France. ICSTI members also endorsed the proposal to serve as the umbrella organization for the future long-term governance of WorldWideScience.org. A Terms of Reference document was developed to define this governance structure, and the Terms of Reference were accepted by ICSTI members at the 2008 Winter meeting in Paris.
• A federation of the leading science portals sponsored by the governments and national institutions of over 50 countries
• A quantity of science (more than 200 million pages from every inhabited continent)
• A breakthrough in content enabled by breakthrough technology
Many popular search engines rely on crawler-based technology
WorldWideScience.org implements federated searching to provide its encompassing coverage of global science and research results. It is a little known fact that many of the popular search engines overlook a large portion of the web. Their technology relies upon crawlers, which find and visit websites one at a time by following hyperlinks. Each time a crawler finds a page, it indexes it. The index is then merged with the master index, and when the user does a search, the query is actually applied against the master index. When there is a match, the results are to hyperlinks indexed sometime in the past.
Federated search systems
Probe the deep web
Federated search drills down to the deep web where scientific databases reside
The bulk of science information, especially scholarly science information, resides in databases. Crawlers can get to the first page of a database, but, typically, they cannot get past the front page. The databases search box is often the only systematic way to see the contents of the database, and crawlers are unable to process the search box. This part of the web that is off limits to crawlers is called the deep web. It is possible for database owners to take special steps to expose their database content to crawlers; however, many organizations have resource constraints and do not pursue these options.
Federated search is a different kind of web search architecture. When the user places a query on a federated search application, like WorldWideScience.org, the query is transmitted to all the servers that host the databases and portals. Those servers then translate the query into its own database and execute the search. Each remote database reports its results back to the WorldWideScience.org server, which combines the hits from all the databases, and sorts them in relevance ranked order. Finally, the ranked list is returned to the user. The whole process can take anywhere from about a second to around twenty seconds, depending on the complexity of the search and the speed of the source databases. Thus, WorldWideScience.org allows the user to search multiple data sources with a single query in real time.
Providing a specific example, if a user searches on the term “nanotechnology,” the WorldWideScience.org interface sends the query to over 40 source databases and portals, which independently run this search and begin returning results.
As results are returned to WorldWideScience.org, the combined results from all sources are run against WorldWideScience.org’s relevance-ranking algorithm and presented to the user based on the prevalence of the search term in the title, metadata, and any other snipped information provided by the source. The user also has the flexibility to reorder the results by source, date, title, or author.
A recent enhancement, added in June 2008, is the results clustering feature. As shown on the left side, results are grouped into similar topics, as well as date ranges. The patron can then “narrow” the search by clicking on one of the clusters, or a specified date range.
Users can view the complete record for each result, and the full-text document if it is available.
With a large number of open access sources, WorldWideScience.org provides a single point of access to vast quantities of full-text science literature.
• African Journals Online
• Article@INIST (France)
• Australian Antarctic Data Centre
• Bangladesh Journals Online (BanglaJOL)
• Canada Institute for Scientific and Technical Information
• Catalogue of the TIB, German National Library of Science & Technology
• CSIR Research Space (South Africa)
• Czech Academy of Sciences Manuscriptorium
• Czech Academy of Sciences Repository
• Defence Research and Development Canada (Canada)
• DEFF Global E Prints (Denmark)
• DEFF Research Database (Denmark)
• Digital Repository Service at National Institute of Oceanography (India)
• Directory of Open Access Journals (Sweeden)
• Electronic Table of Contents (ETOC) (United Kingdom)
• Indian Academy of Sciences
• Indian Institute of Science Eprints
• Indian Institute of Science Theses & Dissertations
• Indian Medlars Centre
• J-EAST (Japan)
• J-STAGE (Japan)
• J-STORE (Japan)
• Journal@rchive (Japan)
• Korea Science (Korea)
• NARCIS (Netherlands)
• Nepal Journals Online (NepJOL)
• Norweigan Open Research Archives (NORA)
• Philippines Journals Online (PhilJOL)
• Science.gov (United States)
• Scientific Electronic Library Online (Argentina, Brazil, Chile, Colombia, Cuba, Mexico, Portugal, Spain, Venezuela)
• Transactions and Proceedings of the Royal Society of New Zealand 1868-1961 (New Zealand)
• UK PubMed Central (United Kingdom)
• Vascoda (Germany)
• Vietnam Journals Online (VJOL)
• VTT Technical Research Centre of Finland Publications Register
• VTT Technical Research Centre of Finland Research Register
Since its release it June 2007, WorldWideScience.org has more than tripled the number of data sources searched, along with greatly increasing the number of countries participating as information providers. It provides access to large prominent collections such as Science.gov (the U.S. contribution) in addition to less well-known sources of highly valuable science. It is estimated that WorldWideScience.org covers more than 200 million pages of scholarly scientific content.
• African Journals Online (AJOL)
• British Library United Kingdom
• Canada Institute for Scientific and Technical Information (CISTI)
• Council for Scientific and Industrial Research (CSIR) South Africa
• German National Library of Science and Technology (TIB)
• Institut de l’Information Scientifique et Technique (INIST) France
• International Council for Scientific and Technical Information (ICSTI)
• International Network for the Availability of Scientific Publications (INASP)
• Japan Science and Technology Agency (JST)
• Korea Institute of Science and Technology Information (KISTI)
• Science.gov Alliance United States
• Scientific Electronic Library Online (SciELO)
• VTT Technical Research Centre of Finland (VTT)
Along with vastly increasing its content since its inception, WorldWideScience.org has also transitioned from bilateral management to a multilateral governance structure, called the WorldWideScience Alliance. The Alliance consists of 13 founding member organizations representing 38 countries. In addition to member countries, ICSTI also serves as an Alliance member and primary sponsor.
• Chair Richard Boulderstone, British Library
• Deputy Chair Pam Bjornson, Canada Institute for Scientific and Technical Information
• Treasurer Tae-Sul Seo, Korea Institute of Science and Technology Information
• Ex-Officio Member Walter Warnick, WorldWideScience.org Operating Agent, U.S. DOE Office of Scientific and Technical Information
• Ex-Officio Member Herbert Gruttemeier, ICSTI President, French Institut de l’Information Scientifique et Technique
• At-Large Member Yvonne Halland, Council for Scientific and Industrial Research, South Africa
An election for the Alliance’s Executive Board was held in early April 2008. Richard Boulderstone, Director, E-Strategy and Information Systems, British Library, was elected Chair. Pam Bjornson, Director General, Canada Institute for Scientific and Technical Information, was elected Deputy Chair. Tae-Sul Seo, Senior Researcher, Korea Institute of Science and Technology Information, was elected Treasurer. The Executive Board also includes two ex-officio members, WorldWideScience.org Operating Agent Walter Warnick, Office of Scientific and Technical Information, U.S. Department of Energy, and ICSTI President Herbert Gruttemeier, French Institut de l’Information Scientifique et Technique. The At-Large Member is Yvonne Halland, Strategic Information Resources Coordinator at the Council for Scientific and Industrial Research in South Africa.
A formal ceremony commemorating the establishment of the Alliance was held in Seoul, Korea on June 12, 2008. Founding members of the Alliance participated in the events. The Alliance welcomes new members and is particularly interested in engaging the participation of main science-producing nations, such as Russia and China.
• Results Clustering Added June 2008
• Personalized Alert Service
• Translation Capabilities
Along with increasing the number and diversity of scientific sources searched by WorldWideScience.org, the Alliance has several other near-term goals. Web 2.0 functionality, such as Alert services, will be added later in 2008. The Alert service will allow users to set up a profile and then generate automatic queries against the WorldWideScience.org sources on a routine basis. So, a user that is interested in seeing all new documents on a particular subject can have those results delivered to his or her email account weekly. Clustering of results, just deployed a few months ago in June, presents clusters based on similar keywords and concepts, as well as clusters based on date ranges. The integration of translations tools is also being considered for later in 2008 and 2009.
Through such efforts, WorldWideScience.org is well timed to other trends in global scientific communication. National research organizations recognize the importance of increasing visibility of their R&D outputs, even in very small countries. At the same time, full-text information accessibility has increased. Through the innovative combinations of federated search and other technologies, scientists and citizens across the globe now have unprecedented access to scientific knowledge.