WorldWideScience.org Alliance


Multilingual WorldWideScience:
Accelerating Discovery through Multilingual Translations


Multilingual WorldWideScience: 
Accelerating Discovery through Multilingual Translations. Link to larger image.

Slide 1: Multilingual WorldWideScience:
Accelerating Discovery through Multilingual Translations

International Council for Scientific and Technical Information (ICSTI) Annual Conference
June 2010, Helsinki, Finland

Walter L. Warnick, Ph.D.
Director
Office of Scientific & Technical Information
U. S. Department of Energy

 


Science Advances Only if Knowledge is Shared: Link to larger image.

Slide 2: Science Advances Only if Knowledge is Shared


"If I have seen further it is only by standing on the shoulders of giants." Sir Isaac Newton

Corollary 1: Scientific discovery can be accelerated by accelerating access to worldwide scientific information.

The case for WorldWideScience.org.

Corollary 2: Multilingual translations of science will further accelerate scientific discovery.

The case for Multilingual WorldWideScience.org

 

Slide 3: The "Accelerating" Power of WorldWideScience.org

The

Overcoming the researcher’s practical limitations:

  1. Not knowing "what’s out there." (examples: Korean medical journals, Australian Antarctic data, South African scientific research database)
  2. Inadequate time to search scientific databases one by one. (examples: UK PubMed Central, Ginsparg’s arXiv.org) Inability to sort compiled results by relevance.

By filling these gaps, WorldWideScience.org has accelerated access to scientific information.

 

Slide 4: Brief History: Federated Search and WorldWideScience.org

 

Brief History: Federated Search and WorldWideScience.org. Link to larger image.

Deep Web

  • where science is
  • hundreds of times larger than the "surface web"
  • generally not "googleable," or searchable, by major search engines

Slide 5: Deep Web Solution: Federated Searching

Deep Web Solution: Federated Searching. Link to larger image.
  • A single user query simultaneously sent to multiple deep web databases.
  • Federated search engine sorts and presents results in relevance-ranked order.
  • Overcomes the 3 practical limitations.
  • No burden on individual database "owners."

Slide 6: Federated Search Examples

Federated Search Examples. Link to larger image.
  • Science.gov – searches across all U.S. federal science agencies' databases (200 million pages)
  • Similar – but different -- experiences outside science:
  • Kayak.com – "compare hundreds of travel sites at once"
  • Pricegrabber.com – comparison shopping across multiple merchants

Slide 7: Global Federated Search

Global Federated Search. Link to larger image.
  • Taking the Science.gov model global – WorldWideScience.org
  • Initial partnership between U.S. Department of Energy and the British Library (2007)

 

Slide 8: Global Federated Search

Global Federated Search. Link to larger image.
  • Transition to multilateral governance (WorldWideScience Alliance) and ICSTI sponsorship (2008)

 

Slide 9: WorldWideScience – Facts and Figures

WorldWideScience – Facts and Figures.  Link to larger image.
  • Tremendous growth in search content: from 10 nations to 65 nations in 3 years
  • > 400 million pages
  • From well-known sources: e.g., PubMed, CERN, KoreaScience
  • To more obscure sources: e.g., Bangladesh Journals Online

 

Slide 10: WorldWideScience – Fills Key Niche in Scientific Discovery

WorldWideScience – Fills Key Niche in Scientific Discovery. Link to larger image.

  • In comparison of search results from identical queries on WWS, Google, and Google Scholar, only 3.5% overlap (i.e., WorldWideScience is 96.5% unique)
Accelerated access → Accelerated discovery: the case for WorldWideScience.org

 

Slide 11: Now, the case for Multilingual WorldWideScience.org …

Now, the case for Multilingual WorldWideScience.org … . Link to larger image.

 

 

Slide 12: Consider this …
While English is the lingua franca for science, these are the world's most widely spoken languages:


Consider this …. Link to larger image.
Rank Language Estimated Number
of Speakers
1 Mandarin Chinese 1,051,000,000
2 English 510,000,000
3 Hindi/Urdu 490,000,000
4 Spanish 429,000,000
5 Arabic 280,000,000
6 Russian 255,000,000
7 Portuguese 230,000,000
8 German 229,000,000
9 Bengali 215,000,000
10 French 130,000,000
11 Japanese 127,000,000
 
(Source: Wikipedia)


Slide 13: Increasing Globalization of Science Calls for Multilingual Search Capabilities …


Increasing Globalization of Science Calls for Multilingual Search Capabilities …. Link to larger image.
  • Is there Science beyond English? Initiatives to increase the quality and visibility of non-English publications might help to break down language barriers in scientific communication (Meneghini and Packer, Nature, 2007)

  • Science's Language Problem: As globalization increases, communication between linguistic communities could become a serious stumbling block (Barany, Business Week, 2005)

  • Science on the Rise in Developing Countries (Holmgren and Schnitzer, PLoS Biology, 2004)

Slide 14: Of the world’s "top 400" institutional repositories, 250, or 63%, have some or all non-English content.


Of the world’s top 400 institutional repositories, 250, or 63%, have some or   all non-English content. Link to larger image.

Examples:

  • HAL CNRS -- French
  • Kyoto University Research Repository – Japanese
  • Leiden University Digital Repository -- Dutch
  • CSIC (Spanish National Research Council)
(Source: Cybermetrics Lab, Spain)

Slide 15: Major Non-English Science "Producers"


Major Non-English Science Producers. Link to larger image.



Slide 16: Screen capture of http://www.istic.ac.cn/ (China)


Link to larger image.

 

 

 

 

Slide 17: Screen capture of http://science.viniti.ru/index.php?option=com_search&Itemid=27/ eLIBRARY.RU (Russia)


Link to larger image.

 

 

Slide 18:


Link to larger image.
  • Japan
  • France
  • Germany
  • Brazil
  • … and many other countries.

 

Slide 19: To further accelerate access to science, multilingual translations are needed in both directions:


To further accelerate access to science, multilingual translations are needed in   both directions: Link to larger image.
  • Translation of English content for non-English speakers … and …

  • Translation of non-English content for English speakers

 

Slide 20:

Link to larger image.
  • Up until now, real-time translation of science has been limited.
  • Generally limited to translating from one language into another single language at one time.
  • Not deployed on deep web scientific databases.
  • Results less than perfect with complex scientific language (note that it's still not perfect but is constantly improving)

 

Slide 21:Link to larger image.

Now, we have the essential ingredients for real-time translation of science

  • National science databases in multiple languages
  • Federated search
  • Multilingual translation on both front and back end of the user experience

A public-private partnership, introduced as Multilingual WorldWideScience.orgBeta

WorldWideScience Alliance

Translations powered by Microsoft® Translator

by Deep Web Technologies

Enabling Science and Innovation ICSTI International Council for Scientific and Technical Information

 

Slide 22:Link to larger image.

Here’s how it works …
  1. A Chinese scientist submits a query in Chinese to Multilingual WorldWideScience.org.
  2. MWWS.org uses Microsoft to translate the Chinese query into individual languages of source databases (English, French, Portuguese, Russian, etc.)
  3. MWWS.org sends the translated queries to corresponding databases, which search their contents and return results in native languages to MWWS.org.
  4. MWWS.org uses Microsoft to translate native language results into Chinese and presents results to the user in relevance-ranked order.

Conversely, an English-speaking user could have a query translated into languages of non-English databases and then get results back in English.

 

Slide 23: DemonstrationDemonstration. Link to larger image.

 

 

 

 

Slide 24:Link to larger image.

With the launch of Multilingual WorldWideScience.org, we are …

  • Opening vast reservoirs of heretofore under-utilized scientific knowledge
  • Providing equal access to science for anyone on the Internet
  • Promoting scientific collaboration, participation, and transparency

    … and accelerating scientific discovery!