WorldWideScience

Sample records for web usage mining

  1. Association and Sequence Mining in Web Usage

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-06-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. Clickstream data can be enriched with information about the content of visited pages and the origin (e.g., geographic, organizational of the requests. The goal of this project is to analyse user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. The focus of this paper is to provide an overview how to use frequent pattern techniques for discovering different types of patterns in a Web log database. In this paper we will focus on finding association as a data mining technique to extract potentially useful knowledge from web usage data. I implemented in Java, using NetBeans IDE, a program for identification of pages’ association from sessions. For exemplification, we used the log files from a commercial web site.

  2. COMPARISON ANALYSIS OF WEB USAGE MINING USING PATTERN RECOGNITION TECHNIQUES

    Directory of Open Access Journals (Sweden)

    Nanhay Singh

    2013-07-01

    Full Text Available Web usage mining is the application of data mining techniques to better serve the needs of web-based applications on the web site. In this paper, we analyze the web usage mining by applying the pattern recognition techniques on web log data. Pattern recognition is defined as the act of taking in raw data and making an action based on the ‘category’ of the pattern. Web usage mining is divided into three partsPreprocessing, Pattern discovery and Pattern analysis. Further, this paper intended with experimental work in which web log data is used. We have taken the web log data from the “NASA” web server which is analyzed with “Web Log Explorer”. Web Log Explorer is a web usage mining tool which plays the vital role to carry out this work.

  3. The Descriptive Study of Knowledge Discovery from Web Usage Mining

    Directory of Open Access Journals (Sweden)

    Yogish H K

    2011-09-01

    Full Text Available The World Wide Web serves as huge, widely distributed, global information service centre for news, advertisements, consumer information, financial management, education, government, e-commerce and many other information services. The web also contains a rich and dynamic collection of hyperlink information and web page access and usage information, providing rich sources of data for data mining. The Web usage mining is the area of data mining which deals with the discovery and analysis of usage patterns from web logs, in order to improve web based applications. Web usage mining consists of three phases, preprocessing, pattern discovery, and pattern analysis. After the completion of these three phases the user can find the required usage patterns and use this information for the specific needs.

  4. Semantic Session Analysis for Web Usage Mining

    Institute of Scientific and Technical Information of China (English)

    ZHANG Hui; SONG Hantao; XU Xiaomei

    2007-01-01

    A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is used to identifying which active session a request should belong to. The competitive method is applied to determine the end of the sessions.Compared with other algorithms, more successful sessions are additionally detected by semantic outlier analysis.

  5. Study on online community user motif using web usage mining

    Science.gov (United States)

    Alphy, Meera; Sharma, Ajay

    2016-04-01

    The Web usage mining is the application of data mining, which is used to extract useful information from the online community. The World Wide Web contains at least 4.73 billion pages according to Indexed Web and it contains at least 228.52 million pages according Dutch Indexed web on 6th august 2015, Thursday. It’s difficult to get needed data from these billions of web pages in World Wide Web. Here is the importance of web usage mining. Personalizing the search engine helps the web user to identify the most used data in an easy way. It reduces the time consumption; automatic site search and automatic restore the useful sites. This study represents the old techniques to latest techniques used in pattern discovery and analysis in web usage mining from 1996 to 2015. Analyzing user motif helps in the improvement of business, e-commerce, personalisation and improvement of websites.

  6. Semantically Enriched Web Usage Mining for Predicting User Future Movements

    Directory of Open Access Journals (Sweden)

    Suresh Shirgave

    2013-10-01

    Full Text Available Explosive and quick growth of the World Wide Web has resulted in intricate Web sites, demanding enhanced user skills and sophisticated tools to help the Web user to find the desired information. Finding desired information on the Web has become a critical ingredient of everyday personal, educational, and business life. Thus, there is a demand for more sophisticated tools to help the user to navigate a Web site and find the desired information. The users must be provided with information and services specific to their needs, rather than an undifferentiated mass of information. For discovering interesting and frequent navigation patterns from Web server logs many Web usage mining techniques have been applied. The recommendation accuracy of solely usage based techniques can be improved by integrating Web site content and site structure in the personalization process.Herein, we propose Semantically enriched Web Usage Mining method (SWUM, which combines the fields of Web Usage Mining and Semantic Web. In the proposed method, the undirected graph derived from usage data is enriched with rich semantic information extracted from the Web pages and the Web site structure. The experimental results show that the SWUM generates accurate recommendations with integration of usage, semantic data and Web site structure. The results shows that proposed method is able to achieve 10-20% better accuracy than the solely usage based model, and 5-8% better than an ontology based model.

  7. IMPROVING THE INTERESTINGNESS OF WEB USAGE MINING

    Institute of Scientific and Technical Information of China (English)

    杨怡玲; 管旭东; 尤晋元

    2002-01-01

    Improvement on mining the frequently visited groups of web pages was studied. First, in the data preprocessing phrase, we introduce an extra frame-filtering step that reduces the negative influence of frame pages on the result page groups. Through recognizing the frame pages in the site documents and constructing the frame-subframe relation set, the subframe pages that influence the final mining result can be efficiently filtered. Second, we enhance the mining algorithm with the consideration of both the site topology and the content of the web pages. By the introduction of the normalized content-link ratio of the web page and the group interlink degree of the page group, the enhanced algorithm concentrates more on the content pages that are less interlinked together. The experiments show that the new approach can effectively reveal more interesting page groups, which would not be found without these enhancements.

  8. RECOMMENDATION FOR WEB SERVICE COMPOSITION BY MINING USAGE LOGS

    Directory of Open Access Journals (Sweden)

    Vivek R

    2016-03-01

    Full Text Available Web service composition has been one of the most researched topics of the past decade. Novel methods of web service composition are being proposed in the literature include Semantics-based composition, WSDLbased composition. Although these methods provide promising results for composition, search and discovery of web service based on QoS parameter of network and semantics or ontology associated with WSDL, they do not address composition based on usage of web service. Web Service usage logs capture time series data of web service invocation by business objects, which innately captures patterns or workflows associated with business operations. Web service composition based on such patterns and workflows can greatly streamline the business operations. In this research work, we try to explore and implement methods of mining web service usage logs. Main objectives include Identifying usage association of services. Linking one service invocation with other, Evaluation of the causal relationship between associations of services.

  9. Incremental Web Usage Mining Based on Active Ant Colony Clustering

    Institute of Scientific and Technical Information of China (English)

    SHEN Jie; LIN Ying; CHEN Zhimin

    2006-01-01

    To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.

  10. Evaluating The Markov Assumption For Web Usage Mining

    DEFF Research Database (Denmark)

    Jespersen, S.; Pedersen, Torben Bach; Thorhauge, J.

    2003-01-01

    Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Grammar (HPG...... knowledge there has been no systematic study of the validity of the Markov assumption wrt.\\ web usage mining and the resulting quality of the mined browsing patterns. In this paper we systematically investigate the quality of browsing patterns mined from structures based on the Markov assumption. Formal...... measures of quality, based on the closeness of the mined patterns to the true traversal patterns, are defined and an extensive experimental evaluation is performed, based on two substantial real-world data sets. The results indicate that a large number of rules must be considered to achieve high quality...

  11. A Model for Web Page Usage Mining Based on Segmentation

    CERN Document Server

    Kuppusamy, K S

    2012-01-01

    The web page usage mining plays a vital role in enriching the page's content and structure based on the feedbacks received from the user's interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focus from users comparing with others. The segment level analytics of user actions provides an important metric to analyse the factors which facilitate the increase in traffic for the page. The empirical validation of the model is performed through prototype implementation.

  12. A Model for Web Page Usage Mining Based on Segmentation

    OpenAIRE

    Kuppusamy, K. S.; Aghila, G.

    2012-01-01

    The web page usage mining plays a vital role in enriching the page's content and structure based on the feedbacks received from the user's interactions with the page. This paper proposes a model for micro-managing the tracking activities by fine-tuning the mining from the page level to the segment level. The proposed model enables the web-master to identify the segments which receives more focus from users comparing with others. The segment level analytics of user actions provides an importan...

  13. Web Usage Mining, Pattern Discovery dan Log File

    Directory of Open Access Journals (Sweden)

    Tri Suratno

    2014-01-01

    Full Text Available Analysis  of  data  to  access  the  server  can  provide  significant  and  useful  information  for  performance  improvement,  restructuring  andimproving the effectiveness of a web site. Data mining is one of the most effective way to detect a series of patterns of information from large amounts of data. Application of  data mining  on  Internet use  called web  mining  is a set of  data mining  techniques  are  used  for the web. Web mining technologies and data mining is a combination of web, which is the integration of technology resources extracted from the  information  world wide web  as  the  implications  of the  web  resources  of interest to  know the  value  of the  model extraction Potentialuse of  data mining  algorithms  over  a variety  of  observational  data for  identify  patterns  of  web resources. Target analysis of web mining is the data from the web, such as data visitors access, web page structure, and format web pages.  Target analysis of the study is that web mining  web  usage  mining  using  association  rules  on  the  website  www.faperta.unja.ac.id  obtained  from  the  log  file  that  is  used  to discover the navigation patterns, and discover the rules of the association between a combination of items.  To  determine the pattern ofvisits  in  a  web,  and  to  identify  what  pages  are frequented by  visitors  of  a  website,  which  can  be used  to  improve  website design  andrecommend the display, as well as the links are often used by visitors, so the quality of services from website www.faperta.unja.ac.id can be provided effectively and efficiently.Keywords : Web Mining; Web usage mining; Data mining; Log file; Website; World wide web

  14. Evaluating The Markov Assumption For Web Usage Mining

    DEFF Research Database (Denmark)

    Jespersen, S.; Pedersen, Torben Bach; Thorhauge, J.

    2003-01-01

    Web usage mining concerns the discovery of common browsing patterns, i.e., pages requested in sequence, from web logs. To cope with the enormous amounts of data, several aggregated structures based on statistical models of web surfing have appeared, e.g., the Hypertext Probabilistic Grammar (HPG......) model~\\cite{borges99data}. These techniques typically rely on the \\textit{Markov assumption with history depth} $n$, i.e., it is assumed that the next requested page is only dependent on the last $n$ pages visited. This is not always valid, i.e. false browsing patterns may be discovered. However, to our...

  15. Challenges and Usage of Link Mining to Semantic Web

    Directory of Open Access Journals (Sweden)

    Zaved Akhtar

    2012-03-01

    Full Text Available - It is an emerging challenge for data mining is the problem of mining richly structures datasets, where the objects are linked in some way. Links among the objects may demonstrate certain patterns, which can be helpful for many data mining tasks and are usually hard to capture with traditional statistical models. Many datasets of interest today are best described as a linked collection of interrelated objects. These may represent homogeneous networks, in which there is a single-object type and link type (eg. people connected by friendship links, or the WWW, a collection of linked web pages or richer, heterogeneous networks, in which there may be multiple object and link types (and possibly other semantic information. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments and contacts, or in bibliographic domains describing publications, authors, and venues. Link mining refers to data mining techniques that explicitly consider these links when building predictive or descriptive models of the linked data. Commonly addressed link mining tasks include object ranking, group detection, collective classification, link prediction and subgraph discovery. This is an exciting and rapidly expanding area. In this article we review some of the common emerging themes and discuss ongoing link mining challenges; open issues and suggest ideas that could be opportunities for solutions. The most conclusion of this article is that providing an idea to usage link mining techniques from link mining to help to construct the Semantic Web.

  16. Web Usage Mining Applied to Records of Navigation by Internet

    OpenAIRE

    Darian Horacio Grass Boada; Alejandro Rosete Suárez; Jesús Eladio Sánchez García; Valia Guerra Ones

    2013-01-01

    This paper presents a Knowledge Discovery on Databases (KDD) process applied on the internetsurfing logs at the University of Informatics Sciences. In this context, it describes a Web-UsageMining process using as data sources; the internet surfing logs stored by the proxy server, and alsodescriptive information regarding the users of such surfing service, which was provided by the institution’spersonnel management systems. Statistical, numerical and clustering techniques were combinedseeking ...

  17. 网站结构和内容对Web使用挖掘的影响%Web Usage Mining Process Influenced by Web Site Structure and Content

    Institute of Scientific and Technical Information of China (English)

    刘丽珍; 宋瀚涛; 陆玉昌

    2003-01-01

    The Paper emphasizes relativity between Web usage mining and the application of Web site structure and content. It has shown that the amount of effort revolved in processing and quantifying the structure and content of a Web site is well worth in performing Web usage mining. The necessity of combining Web site structure and content with Web usage mining process is further proved.

  18. Implementation of Web Usage Mining Using APRIORI and FP Growth Algorithms

    Directory of Open Access Journals (Sweden)

    B.Santhosh Kumar

    2010-05-01

    Full Text Available Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behavior at a Web site. Web usage mining itself can be classified further depending on the kind of usage data considered. They are web server data, application server data and application level data. Web server data correspond to the user logs that are collected at Web server. Some of the typical data collected at a Web server include IP addresses, page references, and access time of the users and is the main input to the present Research. This Research work concentrates on web usage mining and in particular focuses on discovering the web usage patterns of websites from the server log files. The comparison of memory usage and time usage is compared using Apriori algorithm and Frequent Pattern Growth algorithm.

  19. Research on Web Usage Patterns Mining%Web用户访问模式挖掘研究

    Institute of Scientific and Technical Information of China (English)

    陈新中; 李岩; 杨炳儒

    2003-01-01

    With the rapid development of Internet ,Web usage mining plays very important role in many fields includ-ing personalizing information service ,improving designs and service of Web sites ,developing the personal electric com-merce ,building adaptive Web sites, promoting the reputation and income of Web sites. The paper introduces the defi-nition and classification of Web mining firstly,then the main technology and method of Web log preprocessing,the pri-mary algorithm of Web usage mining ,the evaluation method and important applications of Web usage mining are dis-cussed in detail. At the end ,the trend and research course concerning the Web usage mining are concluded.

  20. The Applied Research of Web Usage Mining%Web使用挖掘的应用研究

    Institute of Scientific and Technical Information of China (English)

    刘丽珍; 宋瀚涛; 陆玉昌

    2003-01-01

    Some effective and efficient knowledge patterns will be gained through searching, integrating, mining and analyzing on the Web. These useful knowledge patterns can help us to build so efficient Web site that WWW can ser-vice people well. In this paper we point out Web Usage Mining process influenced by Web site structure and content,and introduce the application of Web Usage mining in E-commerce. In the end a example of Web Usage Mining is given.

  1. Web Usage Mining Applied to Records of Navigation by Internet

    Directory of Open Access Journals (Sweden)

    Darian Horacio Grass Boada

    2013-06-01

    Full Text Available This paper presents a Knowledge Discovery on Databases (KDD process applied on the internetsurfing logs at the University of Informatics Sciences. In this context, it describes a Web-UsageMining process using as data sources; the internet surfing logs stored by the proxy server, and alsodescriptive information regarding the users of such surfing service, which was provided by the institution’spersonnel management systems. Statistical, numerical and clustering techniques were combinedseeking to identify user groups with similar internet surfing account usage, in hopes of providingimportant information for decision making processes carried out by the Network Management andSecurity Office or other areas of the institution. This paper describes the methods and techniquesused, and the procedure utilized for performing the descriptive clustering task. This procedure proposesthe use of the CUR matricial decomposition to identify the possible number of groups to identify by thek-medoides clustering algorithm. Lastly, the experiments carried out and the evaluations of the groupsobtained are described and examples of some of the patterns obtained are presented.

  2. What Can Instructors and Policy Makers Learn about Web-Supported Learning through Web-Usage Mining

    Science.gov (United States)

    Cohen, Anat; Nachmias, Rafi

    2011-01-01

    This paper focuses on a Web-log based tool for evaluating pedagogical processes occurring in Web-supported academic instruction and students' attitudes. The tool consists of computational measures which demonstrate what instructors and policy makers can learn about Web-supported instruction through Web-usage mining. The tool can provide different…

  3. Analysis of Server Log by Web Usage Mining for Website Improvement

    Directory of Open Access Journals (Sweden)

    Navin Kumar Tyagi

    2010-07-01

    Full Text Available Web server logs stores click stream data which can be useful for mining purposes. The data is stored as a result of user's access to a website. Web usage mining an application of data mining can be used to discover user access patterns from weblog data. The obtained results are used in different applications like, site modifications, business intelligence, system improvement and personalization. In this study, we have analyzed the log files of smart sync software web server to get information about visitors; top errors which can be utilized by system administrator and web designer to increase the effectiveness of the web site.

  4. Applying Web Usage Mining for Personalizing Hyperlinks in Web-Based Adaptive Educational Systems

    Science.gov (United States)

    Romero, Cristobal; Ventura, Sebastian; Zafra, Amelia; de Bra, Paul

    2009-01-01

    Nowadays, the application of Web mining techniques in e-learning and Web-based adaptive educational systems is increasing exponentially. In this paper, we propose an advanced architecture for a personalization system to facilitate Web mining. A specific Web mining tool is developed and a recommender engine is integrated into the AHA! system in…

  5. Constructing a web recommender system using web usage mining and user’s profiles

    Directory of Open Access Journals (Sweden)

    T. Mombeini

    2014-12-01

    Full Text Available The World Wide Web is a great source of information, which is nowadays being widely used due to the availability of useful information changing, dynamically. However, the large number of webpages often confuses many users and it is hard for them to find information on their interests. Therefore, it is necessary to provide a system capable of guiding users towards their desired choices and services. Recommender systems search among a large collection of user interests and recommend those, which are likely to be favored the most by the user. Web usage mining was designed to function on web server records, which are included in user search results. Therefore, recommender servers use the web usage mining technique to predict users’ browsing patterns and recommend those patterns in the form of a suggestion list. In this article, a recommender system based on web usage mining phases (online and offline was proposed. In the offline phase, the first step is to analyze user access records to identify user sessions. Next, user profiles are built using data from server records based on the frequency of access to pages, the time spent by the user on each page and the date of page view. Date is of importance since it is more possible for users to request new pages more than old ones and old pages are less probable to be viewed, as users mostly look for new information. Following the creation of user profiles, users are categorized in clusters using the Fuzzy C-means clustering algorithm and S(c criterion based on their similarities. In the online phase, a neural network is offered to identify the suggested model while online suggestions are generated using the suggestion module for the active user. Search engines analyze suggestion lists based on rate of user interest in pages and page rank and finally suggest appropriate pages to the active user. Experiments show that the proposed method of predicting user recent requested pages has more accuracy and

  6. A Novel Incremental Mining Algorithm of Frequent Patterns for Web Usage Mining

    Institute of Scientific and Technical Information of China (English)

    DONG Yihong; ZHUANG Yueting; TAI Xiaoying

    2007-01-01

    Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.

  7. An Application for Data Preprocessing and Models Extractions in Web Usage Mining

    Directory of Open Access Journals (Sweden)

    Claudia Elena DINUCA

    2011-11-01

    Full Text Available Web servers worldwide generate a vast amount of information on web users’ browsing activities. Several researchers have studied these so-called clickstream or web access log data to better understand and characterize web users. The goal of this application is to analyze user behaviour by mining enriched web access log data. With the continued growth and proliferation of e-commerce, Web services, and Web-based information systems, the volumes of click stream and user data collected by Web-based organizations in their daily operations has reached astronomical proportions. This information can be exploited in various ways, such as enhancing the effectiveness of websites or developing directed web marketing campaigns. The discovered patterns are usually represented as collections of pages, objects, or re-sources that are frequently accessed by groups of users with common needs or interests. In this paper we will focus on displaying the way how it was implemented the application for data preprocessing and extracting different data models from web logs data, finding association as a data mining technique to extract potentially useful knowledge from web usage data. We find different data models navigation patterns by analysing the log files of the web-site. I implemented the application in Java using NetBeans IDE. For exemplification, I used the log files data from a commercial web site www.nice-layouts.com.

  8. A new algorithm to create a profile for users of web site benefiting from web usage mining

    Directory of Open Access Journals (Sweden)

    masomeh khabazfazli

    2015-11-01

    Full Text Available Upon integration of internet and its various applications and increase of internet pages, access to information in search engines becomes difficult. To solve this problem, web page recommendation systems are used. In this paper, recommender engine are improved and web usage mining methods are used for this purpose. In recommendation system, clustering was used for classification of users’ behavior. In fact, we implemented usage mining operation on the data related to each user for making its movement pattern. Then, web pages were recommended using neural network and markov model. So, performance of recommendation engine was improved using user’s movement patterns and clustering and neural network and Markov model, and obtained better results than other methods. To predict the data recovery quality on web, two factors including accuracy and coverage were used

  9. U-Sem: Semantic Enrichment, User Modeling and Mining of Usage Data on the Social Web

    CERN Document Server

    Abel, Fabian; Hauff, Claudia; Hollink, Laura; Houben, Geert-Jan

    2011-01-01

    With the growing popularity of Social Web applications, more and more user data is published on the Web everyday. Our research focuses on investigating ways of mining data from such platforms that can be used for modeling users and for semantically augmenting user profiles. This process can enhance adaptation and personalization in various adaptive Web-based systems. In this paper, we present the U-Sem people modeling service, a framework for the semantic enrichment and mining of people's profiles from usage data on the Social Web. We explain the architecture of our people modeling service and describe its application in an adult e-learning context as an example. Versions: Mar 21, 10:10, Mar 25, 09:37

  10. A Dynamic Recommender System for Improved Web Usage Mining and CRM Using Swarm Intelligence

    Directory of Open Access Journals (Sweden)

    Anna Alphy

    2015-01-01

    Full Text Available In modern days, to enrich e-business, the websites are personalized for each user by understanding their interests and behavior. The main challenges of online usage data are information overload and their dynamic nature. In this paper, to address these issues, a WebBluegillRecom-annealing dynamic recommender system that uses web usage mining techniques in tandem with software agents developed for providing dynamic recommendations to users that can be used for customizing a website is proposed. The proposed WebBluegillRecom-annealing dynamic recommender uses swarm intelligence from the foraging behavior of a bluegill fish. It overcomes the information overload by handling dynamic behaviors of users. Our dynamic recommender system was compared against traditional collaborative filtering systems. The results show that the proposed system has higher precision, coverage, F1 measure, and scalability than the traditional collaborative filtering systems. Moreover, the recommendations given by our system overcome the overspecialization problem by including variety in recommendations.

  11. A Dynamic Recommender System for Improved Web Usage Mining and CRM Using Swarm Intelligence.

    Science.gov (United States)

    Alphy, Anna; Prabakaran, S

    2015-01-01

    In modern days, to enrich e-business, the websites are personalized for each user by understanding their interests and behavior. The main challenges of online usage data are information overload and their dynamic nature. In this paper, to address these issues, a WebBluegillRecom-annealing dynamic recommender system that uses web usage mining techniques in tandem with software agents developed for providing dynamic recommendations to users that can be used for customizing a website is proposed. The proposed WebBluegillRecom-annealing dynamic recommender uses swarm intelligence from the foraging behavior of a bluegill fish. It overcomes the information overload by handling dynamic behaviors of users. Our dynamic recommender system was compared against traditional collaborative filtering systems. The results show that the proposed system has higher precision, coverage, F1 measure, and scalability than the traditional collaborative filtering systems. Moreover, the recommendations given by our system overcome the overspecialization problem by including variety in recommendations.

  12. Discovering More Accurate Frequent Web Usage Patterns

    CERN Document Server

    Bayir, Murat Ali; Cosar, Ahmet; Fidan, Guven

    2008-01-01

    Web usage mining is a type of web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web users. As in classical data mining, data preparation and pattern discovery are the main issues in web usage mining. The first phase of web usage mining is the data processing phase, which includes the session reconstruction operation from server logs. Session reconstruction success directly affects the quality of the frequent patterns discovered in the next phase. In reactive web usage mining techniques, the source data is web server logs and the topology of the web pages served by the web server domain. Other kinds of information collected during the interactive browsing of web site by user, such as cookies or web logs containing similar information, are not used. The next phase of web usage mining is discovering frequent user navigation patterns. In this phase, pattern discovery methods are applied on the reconstructed sessions obtained in the first phas...

  13. Web Mining%Web 数学挖掘

    Institute of Scientific and Technical Information of China (English)

    王实; 高文; 李锦涛

    2000-01-01

    Web Mining is an important branch in Data Mining.It attracts more research interest for rapidly developing Internet. Web Mining includes:(1)Web Content Mining;(g)Web Usage Mining;(3) Web structure Mining.In this paper we define Web Mining and present an overview of the various research issues,techniques and development efforts.

  14. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN classification method

    Directory of Open Access Journals (Sweden)

    D.A. Adeniyi

    2016-01-01

    Full Text Available The major problem of many on-line web sites is the presentation of many choices to the client at a time; this usually results to strenuous and time consuming task in finding the right product or information on the site. In this work, we present a study of automatic web usage data mining and recommendation system based on current user behavior through his/her click stream data on the newly developed Really Simple Syndication (RSS reader website, in order to provide relevant information to the individual without explicitly asking for it. The K-Nearest-Neighbor (KNN classification method has been trained to be used on-line and in Real-Time to identify clients/visitors click stream data, matching it to a particular user group and recommend a tailored browsing option that meet the need of the specific user at a particular time. To achieve this, web users RSS address file was extracted, cleansed, formatted and grouped into meaningful session and data mart was developed. Our result shows that the K-Nearest Neighbor classifier is transparent, consistent, straightforward, simple to understand, high tendency to possess desirable qualities and easy to implement than most other machine learning techniques specifically when there is little or no prior knowledge about data distribution.

  15. Web Mining: An Overview

    Directory of Open Access Journals (Sweden)

    P. V. G. S. Mudiraj B. Jabber K. David raju

    2011-12-01

    Full Text Available Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites. The motive of mining is to find users’ access models automatically and quickly from the vast Web log data, such as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, registration information and other relative information left by user provide foundation for decision making of organizations. This article provides a survey and analysis of current Web usage mining systems and technologies. There are generally three tasks in Web Usage Mining: Preprocessing, Pattern analysis and Knowledge discovery. Preprocessing cleans log file of server by removing log entries such as error or failure and repeated request for the same URL from the same host etc... The main task of Pattern analysis is to filter uninteresting information and to visualize and interpret the interesting pattern to users. The statistics collected from the log file can help to discover the knowledge. This knowledge collected can be used to take decision on various factors like Excellent, Medium, Weak users and Excellent, Medium and Weak web pages based on hit counts of the web page in the web site. The design of the website is restructured based on user’s behavior or hit counts which provides quick response to the web users, saves memory space of servers and thus reducing HTTP requests and bandwidth utilization. This paper addresses challenges in three phases of Web Usage mining along with Web Structure Mining.This paper also discusses an application of WUM, an online Recommender System that dynamically generates links to pages that have not yet been visited by a user and might be of his potential interest. Differently from the recommender systems proposed so far, ONLINE MINER does not make use of any off-line component, and is able to manage Web sites made up of pages dynamically generated.

  16. Applying Web Usage Mining Techniques to Design Effective Web Recommendation Systems: A Case Study

    Directory of Open Access Journals (Sweden)

    Maryam Jafari

    Full Text Available Recommender systems are helpful tools which provide an adaptive Web environment for Web users. Recently, a number of Web page recommender systems have been developed to extract the user behavior from the user’s navigational path and predict the next reque ...

  17. A Novel Semantically-Time-Referrer based Approach of Web Usage Mining for Improved Sessionization in Pre-Processing of Web Log

    Directory of Open Access Journals (Sweden)

    Navjot Kaur

    2017-01-01

    Full Text Available Web usage mining(WUM , also known as Web Log Mining is the application of Data Mining techniques, which are applied on large volume of data to extract useful and interesting user behaviour patterns from web logs, in order to improve web based applications. This paper aims to improve the data discovery by mining the usage data from log files. In this paper the work is done in three phases. First and second phase0 which are data cleaning and user identification respectively are completed using traditional methods. The third phase, session identification is done using three different methods. The main focus of this paper is on sessionization of log file which is a critical step for extracting usage patterns. The proposed referrer-time and Semantically-time-referrer methods overcome the limitations of traditional methods. The main advantage of pre-processing model presented in this paper over other methods is that it can process text or excel log file of any format. The experiments are performed on three different log files which indicate that the proposed semantically-time-referrer based heuristic approach achieves better results than the traditional time and Referrer-time based methods. The proposed methods are not complex to use. Web log file is collected from different servers and contains the public information of visitors. In addition, this paper also discusses different types of web log formats.

  18. The Web Object Store: an infrastructure for mining semantics from web resources and their usage

    OpenAIRE

    Nanni, Mirco; Silvestri, Fabrizio; Giannotti, Fosca; Pedreschi, Dino

    2005-01-01

    The development of methods for an effective and efficient access to the information contained in large masses of digital documents is a long-standing objective in computer science research, and its importance is emphasized by the growing availability of large information repositories. With the advent of the web, the methods for content delivery evolved in the services offered by search engines, categorization and topic search services, related pages services, etc.: the main innovation needed ...

  19. Is Toscana A Formal Concept Analysis Based Solution In Web Usage Mining?

    Directory of Open Access Journals (Sweden)

    Dan-Andrei SITAR-TĂUT

    2012-01-01

    Full Text Available Analyzing large amount of data come from web logs represents a complex, but challenging nowadays problem with implication in various fields, thing that lets open a way for theoretically infinite approaches an implementations. The main goal of our paper represents the possibility of applying the formal concept analysis as viable solution of sustaining the web mining process, based on a technological open-source solution called TOSCANA.

  20. Web使用挖掘系统研制中的主要问题和应对策略%Key Issues and Solution Strategy in R&D of Web Usage Mining Tools

    Institute of Scientific and Technical Information of China (English)

    张锋; 常会友

    2003-01-01

    With the rapid development of WWW, Web Usage Mining, as well as Web Mining, has become a hot direction in academic and industrial circles. It is generally believed that there are three tasks, preprocessing, knowledge discovery and pattern analysis, in Web Usage Mining. Though Web Usage Mining is still ranged in the application of traditional data mining techniques, in view of changes in application environment and operated data concerned, some new difficulties have arisen accordingly. This paper takes efforts to address such challenges in the three phases and introduces some proposed solutions simultaneously.

  1. Web Miner: A Tool for Discovery of Usage Patterns From Web Data

    Directory of Open Access Journals (Sweden)

    Roop Ranjan

    2013-05-01

    Full Text Available As there is a huge amount of data available online, the World Wide Web is a fertile area for data mining research. In recent years a various surveys have been performed on static data of web sites toperform web usage mining. This paper deals with the Web usage mining of a website which is hosted on IIS web server. Web usage mining is the area of data mining which deals with the discovery and analysisof usage patterns from Web data, specifically web logs, in order to perform improvements in web based applications. Web usage mining consists of three phases, pre-processing, pattern discovery, and pattern analysis. After the completion of these three phases the user can find the required usage patterns and use this information for the specific needs. The research is being performed on a log file using Log Parser.

  2. Data pre-processing for web log mining: Case study of commercial bank website usage analysis

    Directory of Open Access Journals (Sweden)

    Jozef Kapusta

    2013-01-01

    Full Text Available We use data cleaning, integration, reduction and data conversion methods in the pre-processing level of data analysis. Data processing techniques improve the overall quality of the patterns mined. The paper describes using of standard pre-processing methods for preparing data of the commercial bank website in the form of the log file obtained from the web server. Data cleaning, as the simplest step of data pre-processing, is non–trivial as the analysed content is highly specific. We had to deal with the problem of frequent changes of the content and even frequent changes of the structure. Regular changes in the structure make use of the sitemap impossible. We presented approaches how to deal with this problem. We were able to create the sitemap dynamically just based on the content of the log file. In this case study, we also examined just the one part of the website over the standard analysis of an entire website, as we did not have access to all log files for the security reason. As the result, the traditional practices had to be adapted for this special case. Analysing just the small fraction of the website resulted in the short session time of regular visitors. We were not able to use recommended methods to determine the optimal value of session time. Therefore, we proposed new methods based on outliers identification for raising the accuracy of the session length in this paper.

  3. Web使用模式挖掘在优化网站设计中的应用%Research on Application of Web Usage Mining to Optimize Web Design

    Institute of Scientific and Technical Information of China (English)

    沈红超; 冉文江

    2009-01-01

    该文介绍了Web使用挖掘流程及所采用的数据挖掘技术,通过对Web使用模式挖掘结果的分析,探讨Web使用挖掘在电子商务网站设计优化中的应用,使网站设计更加符合用户需求,从而促进企业电子商务活动的发展.%This paper describes the process of Web usage mining and data mining technology it adopted. Through analyze the Web usage patterns to explore the role that Web usage mining in e-commerce web site design optimization. Make website design so that more in line with user needs in order to promote the development of e-commerce activities in the framework of the enterprise.

  4. A semantically enriched web usage based recommendation model

    CERN Document Server

    Ramesh, C; Govardhan, A

    2011-01-01

    With the rapid growth of internet technologies, Web has become a huge repository of information and keeps growing exponentially under no editorial control. However the human capability to read, access and understand Web content remains constant. This motivated researchers to provide Web personalized online services such as Web recommendations to alleviate the information overload problem and provide tailored Web experiences to the Web users. Recent studies show that Web usage mining has emerged as a popular approach in providing Web personalization. However conventional Web usage based recommender systems are limited in their ability to use the domain knowledge of the Web application. The focus is only on Web usage data. As a consequence the quality of the discovered patterns is low. In this paper, we propose a novel framework integrating semantic information in the Web usage mining process. Sequential Pattern Mining technique is applied over the semantic space to discover the frequent sequential patterns. Th...

  5. Web Usage Mining: Application to an Online Educational Digital Library Service

    Science.gov (United States)

    Palmer, Bart C.

    2012-01-01

    This dissertation was situated in the crossroads of educational data mining (EDM), educational digital libraries (such as the National Science Digital Library; http://nsdl.org), and examination of teacher behaviors while creating online learning resources in an end-user authoring system, the Instructional Architect (IA; http://ia.usu.edu). The…

  6. Web Usage Mining: Application to an Online Educational Digital Library Service

    Science.gov (United States)

    Palmer, Bart C.

    2012-01-01

    This dissertation was situated in the crossroads of educational data mining (EDM), educational digital libraries (such as the National Science Digital Library; http://nsdl.org), and examination of teacher behaviors while creating online learning resources in an end-user authoring system, the Instructional Architect (IA; http://ia.usu.edu). The…

  7. A Survey on Preprocessing Methods for Web Usage Data

    Directory of Open Access Journals (Sweden)

    V.Chitraa

    2010-03-01

    Full Text Available World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the application of data mining techniques in web data. Web Usage Mining applies mining techniques in log data to extract the behavior of users which is used in various applications like personalized services, adaptive web sites, customer profiling, prefetching, creating attractive web sites etc., Web usage mining consists of three phases preprocessing, pattern discovery and pattern analysis. Web log data is usually noisy and ambiguous and preprocessing is an important process before mining. For discovering patterns sessions are to be constructed efficiently. This paper reviews existing work done in the preprocessing stage. A brief overview of various data mining techniques for discovering patterns, and pattern analysis are discussed. Finally a glimpse of various applications of web usage mining is also presented.

  8. Towards semantic web mining

    OpenAIRE

    Berendt, Bettina; Hotho, Andreas; Stumme, Gerd

    2002-01-01

    Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.

  9. Generating dynamic higher-order Markov models in web usage mining

    OpenAIRE

    Borges, J; Levene, Mark

    2005-01-01

    Markov models have been widely used for modelling users’ web navigation behaviour. In previous work we have presented a dynamic clustering-based Markov model that accurately represents second-order transition probabilities given by a collection of navigation sessions. Herein, we propose a generalisation of the method that takes into account higher-order conditional probabilities. The method makes use of the state cloning concept together with a clustering technique to separate the navigation ...

  10. Semantic Web Mining: Benefits, Challenges and Opportunities

    Directory of Open Access Journals (Sweden)

    Syeda Farha Shazmeen, Etyala Ramyasree

    2012-12-01

    Full Text Available Semantic Web Mining aims at combining the two areas Semantic Web and Web Mining by using semantics to improve mining and using mining to create semantics. Web Mining aims at discovering insights about the meaning of Web resources and their usage In Semantic Web, the semantics information is presented by the relation with others and is recorded by RDF. RDF which is semantic web technology that can be utilized to build efficient and scalable systems for Cloud. The Semantic Web enriches the World Wide Web by machine process able information which supports the user in his tasks, and also helps the users to get the exact search result .In this paper; we discuss the interplay of the Semantic Web with Web Mining, list out the benefits. Challenges, opportunities of the Semantic web are discussed.

  11. Association Rule Mining for Web Recommendation

    Directory of Open Access Journals (Sweden)

    R. Suguna

    2012-10-01

    Full Text Available Web usage mining is the application of web mining to discover the useful patterns from the web in order to understand and analyze the behavior of the web users and web based applications. It is theemerging research trend for today’s researchers. It entirely deals with web log files which contain the user website access information. It is an interesting thing to analyze and understand the user behaviorabout the web access. Web usage mining normally has three categories: 1. Preprocessing, 2. Pattern Discovery and 3. Pattern Analysis. This paper proposes the association rule mining algorithms for betterWeb Recommendation and Web Personalization. Web recommendation systems are considered as an important role to understand customers’ behavior, interest, improving customer convenience, increasingservice provider profits and future needs.

  12. A SEMANTICALLY ENRICHED WEB USAGE BASED RECOMMENDATION MODEL

    Directory of Open Access Journals (Sweden)

    C.Ramesh

    2011-11-01

    Full Text Available With the rapid growth of internet technologies, Web has become a huge repository of information andkeeps growing exponentially under no editorial control. However the human capability to read, accessand understand Web content remains constant. This motivated researchers to provide Web personalizedonline services such as Web recommendations to alleviate the information overload problem and providetailored Web experiences to the Web users. Recent studies show that Web usage mining has emerged as apopular approach in providing Web personalization. However conventional Web usage basedrecommender systems are limited in their ability to use the domain knowledge of the Web application.The focus is only on Web usage data. As a consequence the quality of the discovered patterns is low. Inthis paper, we propose a novel framework integrating semantic information in the Web usage miningprocess. Sequential Pattern Mining technique is applied over the semantic space to discover the frequentsequential patterns. The frequent navigational patterns are extracted in the form of Ontology instancesinstead of Web page views and the resultant semantic patterns are used for generating Web pagerecommendations to the user. Experimental results shown are promising and proved that incorporatingsemantic information into Web usage mining process can provide us with more interesting patterns whichconsequently make the recommendation system more functional, smarter and comprehensive

  13. Binary Particle Swarm Optimization based Biclustering of Web usage Data

    CERN Document Server

    Bagyamani, R Rathipriya K Thangavel J

    2011-01-01

    Web mining is the nontrivial process to discover valid, novel, potentially useful knowledge from web data using the data mining techniques or methods. It may give information that is useful for improving the services offered by web portals and information access and retrieval tools. With the rapid development of biclustering, more researchers have applied the biclustering technique to different fields in recent years. When biclustering approach is applied to the web usage data it automatically captures the hidden browsing patterns from it in the form of biclusters. In this work, swarm intelligent technique is combined with biclustering approach to propose an algorithm called Binary Particle Swarm Optimization (BPSO) based Biclustering for Web Usage Data. The main objective of this algorithm is to retrieve the global optimal bicluster from the web usage data. These biclusters contain relationships between web users and web pages which are useful for the E-Commerce applications like web advertising and marketin...

  14. WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK – AN OVERVIEW

    OpenAIRE

    V.Lakshmi Praba; T. Vasantha

    2011-01-01

    Web Mining is the extraction of interesting and potentially useful patterns and information from Web. It includes Web documents, hyperlinks between documents, and usage logs of web sites. The significant task for web mining can be listed out as Information Retrieval, Information Selection / Extraction, Generalization and Analysis. Web information retrieval tools consider only the text on pages and ignore information in the links. The goal of Web structure mining is to explore structural summa...

  15. Web Page Recommendation Using Web Mining

    Directory of Open Access Journals (Sweden)

    Modraj Bhavsar

    2014-07-01

    Full Text Available On World Wide Web various kind of content are generated in huge amount, so to give relevant result to user web recommendation become important part of web application. On web different kind of web recommendation are made available to user every day that includes Image, Video, Audio, query suggestion and web page. In this paper we are aiming at providing framework for web page recommendation. 1 First we describe the basics of web mining, types of web mining. 2 Details of each web mining technique.3We propose the architecture for the personalized web page recommendation.

  16. A Research Framework of Web Search Engine Usage Mining%Web搜索引擎日志挖掘研究框架

    Institute of Scientific and Technical Information of China (English)

    王继民; 李雷明子; 孟涛

    2011-01-01

    Log files of search engines record the interactive procedure between users and the system completely. Mining the logs can help us to discover the characteristics of user behaviors and to improve the performance of search systems. This paper gives a framework on Web search engine usage mining, which includes the choice of data collections, the methods of data preprocessing, and an analysis and comparison of search behaviors from different countries. We also explore its applications on improving the effectiveness and efficiency of search engines.%搜索引擎日志记录了用户与系统交互的整个过程.对日志文件进行挖掘,可以发现用户进行Web搜索的行为特征与规律,有效改善搜索引擎系统的性能.在对国内外相关研究进行系统梳理和总结的基础上,文章提出了一个Web搜索引擎日志挖掘的研究框架,主要包括日志挖掘的研究内容、数据集的选择方法、数据预处理的方法、不同地域用户行为的特征与比较、如何应用于系统性能的改善等内容.

  17. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web ...... sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis....

  18. Users’ recognition in web using web mining techniques

    Directory of Open Access Journals (Sweden)

    Hamed Ghazanfaripoor

    2013-06-01

    Full Text Available The rapid growth of the web and the lack of structure or an integrated schema create various issues to access the information for users. All users’ access on web information are saved in the related server log files. The circumstance of using these files is implemented as a resource for finding some patterns of user's behavior. Web mining is a subset of data mining and it means the mining of the related data from WWW, which is categorized into three parts including web content mining, web structure mining and web usage mining, based on the part of data, which is mined. It seems necessary to have a technique, which is capable of learning the users’ interests and based on the interests, which could filter the unrelated interests automatically or it could offer the related information to the user in reasonable amount of time. The web usage mining makes a profile from users to recognize them and it has direct relationship to web personalizing. The primary objective of personalizing systems is to prepare the thing, which is required by users, without asking them explicitly. In the other way, formal models prepare the possibility of system’s behavior modeling. The Petri and queue nets as some samples of these models can analyze the user's behavior in web. The primary objective of this paper is to present a colored Petri net to model the user's interactions for offering a list of pages recommendation to them in web. Estimating the user's behavior is implemented in some cases like offering the proper pages to continue the browse in web, ecommerce and targeted advertising. The preliminary results indicate that the proposed method is able to improve the accuracy criterion 8.3% rather static method.

  19. Web Mining and Social Networking

    DEFF Research Database (Denmark)

    Xu, Guandong; Zhang, Yanchun; Li, Lin

    sense of individuals or communities. The volume will benefit both academic and industry communities interested in the techniques and applications of web search, web data management, web mining and web knowledge discovery, as well as web community and social network analysis.......This book examines the techniques and applications involved in the Web Mining, Web Personalization and Recommendation and Web Community Analysis domains, including a detailed presentation of the principles, developed algorithms, and systems of the research in these areas. The applications of web...... mining, and the issue of how to incorporate web mining into web personalization and recommendation systems are also reviewed. Additionally, the volume explores web community mining and analysis to find the structural, organizational and temporal developments of web communities and reveal the societal...

  20. Research on the Approaches of Session Identification in Web Usage Mining%Web使用挖掘的会话识别方法研究综述

    Institute of Scientific and Technical Information of China (English)

    边鹏

    2012-01-01

    对影响Web使用挖掘效果的会话识别方法进行理论研究,将会话识别按照对用户行为的不同假设分为基于时间的、基于导航的和基于语义的三种启发式方法,并对每种方法又进行细分研究,对会话识别理论方法进行综述,讨论这三种方法的各自优点和存在的问题。在对会话识别的方法进行综合比较的基础上,指出会话识别方法研究的两个趋势,一个是表示Web日志访问请求所代表的语义,一个是分析用户行为。%This essay is a theoretical research on the session identification approaches that will affect the effect of Web usage mining,and the session identification approaches are divided into three heuristics-based on time,navigation,and semantic.Moreover,each heuristic is divided and studied.The theoretical approaches are summarized,and their advantages,shortcomings and differences are discussed.By the end of this essay,the two possibilities of improving session identification approaches are provided.The one is the semantics of the request in web log,the other one is the analysis on user's behavior.

  1. WEB STRUCTURE MINING USING PAGERANK, IMPROVED PAGERANK – AN OVERVIEW

    Directory of Open Access Journals (Sweden)

    V. Lakshmi Praba

    2011-03-01

    Full Text Available Web Mining is the extraction of interesting and potentially useful patterns and information from Web. It includes Web documents, hyperlinks between documents, and usage logs of web sites. The significant task for web mining can be listed out as Information Retrieval, Information Selection / Extraction, Generalization and Analysis. Web information retrieval tools consider only the text on pages and ignore information in the links. The goal of Web structure mining is to explore structural summary about web. Web structure mining focusing on link information is an important aspect of web data. This paper presents an overview of the PageRank, Improved Page Rank and its working functionality in web structure mining.

  2. Study And Implementation Of LCS Algorithm For Web Mining

    Directory of Open Access Journals (Sweden)

    Vrishali P. Sonavane

    2012-03-01

    Full Text Available The Internet is the roads and the highways in the information World, the content providers are the road workers, and the visitors are the drivers. As in the real world, there can be traffic jams, wrong signs, blind alleys, and so on. The content providers, as the road workers, need information about their users to make possible Web site adjustments. Web logs store every motion on the provider's Web site. So the providers need only a tool to analyze these logs. This tool is called Web Usage Mining. Web Usage Mining is a part of Web Mining. It is the foundation for a Web site analysis. It employs various knowledge discovery methods to gain Web usage patterns. In this paper we used LCS algorithm for improving accuracy of recommendation. The Expremental results show that the approach can improve accuracy of classification in the architecture. Using LCS algorithm we can predict users future request more accurately.

  3. Mining usage patterns for the Android API

    Directory of Open Access Journals (Sweden)

    Hudson S. Borges

    2015-07-01

    Full Text Available API methods are not used alone, but in groups and following patterns. However, despite being a key information for API users, most usage patterns are not described in official API documents. In this article, we report a study that evaluates the feasibility of automatically enriching API documents with information on usage patterns. For this purpose, we mine and analyze 1,952 usage patterns, from a set of 396 Android applications. As part of our findings, we report that the Android API has many undocumented and non-trivial usage patterns, which can be inferred using association rule mining algorithms. We also describe a field study where a version of the original Android documentation is instrumented with the extracted usage patterns. During 17 months, this documentation received 77,863 visits from professional Android developers.

  4. Web Usage Analysis: New Science Indicators and Co-usage

    CERN Document Server

    Polanco, Xavier; Besagni, Dominique

    2008-01-01

    A new type of statistical analysis of the science and technical information (STI) in the Web context is produced. We propose a set of indicators about Web users, visualized bibliographic records, and e-commercial transactions. In addition, we introduce two Web usage factors. Finally, we give an overview of the co-usage analysis. For these tasks, we introduce a computer based system, called Miri@d, which produces descriptive statistical information about the Web users' searching behaviour, and what is effectively used from a free access digital bibliographical database. The system is conceived as a server of statistical data which are carried out beforehand, and as an interactive server for online statistical work. The results will be made available to analysts, who can use this descriptive statistical information as raw data for their indicator design tasks, and as input for multivariate data analysis, clustering analysis, and mapping. Managers also can exploit the results in order to improve management and d...

  5. Study on Web usage mining in search engine of university library%高校图书馆搜索引擎中Web使用记录挖掘研究

    Institute of Scientific and Technical Information of China (English)

    赵静

    2013-01-01

    针对高校的信息资源检索的命中率低问题提出了运用Web使用记录挖掘的高校图书馆搜索引擎.通过采用Web使用记录挖掘技术和Clementine对高校图书馆网站的Web访问日志记录进行挖掘.在Web使用记录挖掘流程中,提出一个基于用户IP、登陆时间、网站的拓扑图、引用网页和Agent采识别出单个用户的新算法,获得有效提高识别用户的实验结果.最后用路径分析来挖掘模式,优化网站结构,从而提高高校图书馆搜索引擎的命中率.%Because the hit rate of university information resource retrieval is low, the university library search engine applying Web usage mining is put forward. Through Web usage mining technology and Clementine, Web access log record of uni-versity library website was excavated. In the process of Web usage mining, a new algorithm that identifies individual users is pro-posed based on user IP, log time, site topological graph, cited webpage and Agent, so as to improve the effect of user recognition. The path analysis is used to excavate the pattern and optimize the website structure, so that the hit rate of university library search engine is raised up.

  6. WEB MINING BASED FRAMEWORK FOR ONTOLOGY LEARNING

    Directory of Open Access Journals (Sweden)

    C.Ramesh

    2015-07-01

    Full Text Available Today, the notion of Semantic Web has emerged as a prominent solution to the problem of organizing the immense information provided by World Wide Web, and its focus on supporting a better co-operation between humans and machines is noteworthy. Ontology forms the major component of Semantic Web in its realization. However, manual method of ontology construction is time-consuming, costly, error-prone and inflexible to change and in addition, it requires a complete participation of knowledge engineer or domain expert. To address this issue, researchers hoped that a semi-automatic or automatic process would result in faster and better ontology construction and enrichment. Ontology learning has become recently a major area of research, whose goal is to facilitate construction of ontologies, which reduces the effort in developing ontology for a new domain. However, there are few research studies that attempt to construct ontology from semi-structured Web pages. In this paper, we present a complete framework for ontology learning that facilitates the semi-automation of constructing and enriching web site ontology from semi structured Web pages. The proposed framework employs Web Content Mining and Web Usage mining in extracting conceptual relationship from Web. The main idea behind this concept was to incorporate the web author's ideas as well as web users’ intentions in the ontology development and its evolution.

  7. A Web Mining Approach for Personalized E-Learning System

    Directory of Open Access Journals (Sweden)

    Manasi Chakurkar

    2014-01-01

    Full Text Available The Web Mining plays a very important role for the E-learning systems. In personalized E-Learning system, user customize the learning environment based on personal choices. In a general search process ,a hyperlink which is having maximum number of hits will get displayed first . For making a personalized system history of every user need to be saved in the form of user logs. In this paper we present a architecture with the use of Web mining for Web personalization. The proposed system provides a new approach with combination of web usage mining, HITS algorithm and web content mining. It combines hits results on user logs and web page contents with a clustering algorithm called as Lingo clustering algorithm. This proposed system with combined approach gives a better performance than a usage based system. Further the results are computed according to matrices computed from previous and proposed method.

  8. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  9. Using Open Web APIs in Teaching Web Mining

    Science.gov (United States)

    Chen, Hsinchun; Li, Xin; Chau, M.; Ho, Yi-Jen; Tseng, Chunju

    2009-01-01

    With the advent of the World Wide Web, many business applications that utilize data mining and text mining techniques to extract useful business information on the Web have evolved from Web searching to Web mining. It is important for students to acquire knowledge and hands-on experience in Web mining during their education in information systems…

  10. Web Personalization Using Web Mining

    Directory of Open Access Journals (Sweden)

    Ms.Kavita D.Satokar,

    2010-03-01

    Full Text Available The information on the web is growing dramatically. The users has to spend lots of time on the web finding the information they are interested in. Today, he traditional search engines do not give users enough personalized help but provide the user with lots of irrelevant information. In this paper, we present a personalize Web searchsystem, which can helps users to get the relevant web pages based on their selection from the domain list. Thus, users can obtain a set of interested domains and the web pages from the system. The system is based on features extracted from hyperlinks, such as anchor terms or URL tokens. Our methodology uses an innovative weighted URL Rank algorithm based on user interested domains and user query.

  11. Construction of Community Web Directories based on Web usage Data

    CERN Document Server

    Sandhyarani, Ramancha; Gyani, Jayadev; 10.5121/acij.2012.3205

    2012-01-01

    This paper support the concept of a community Web directory, as a Web directory that is constructed according to the needs and interests of particular user communities. Furthermore, it presents the complete method for the construction of such directories by using web usage data. User community models take the form of thematic hierarchies and are constructed by employing clustering approach. We applied our methodology to the ODP directory and also to an artificial Web directory, which was generated by clustering Web pages that appear in the access log of an Internet Service Provider. For the discovery of the community models, we introduced a new criterion that combines a priori thematic informativeness of the Web directory categories with the level of interest observed in the usage data. In this context, we introduced and evaluated new clustering method. We have tested the methodology using access log files which are collected from the proxy servers of an Internet Service Provider and provided results that ind...

  12. An Object Oriented Approach to Mining Web Graphs for Recommendations

    Directory of Open Access Journals (Sweden)

    T Murali Mohan

    2012-06-01

    Full Text Available Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes music, images, books recommendations, query suggestions, etc. In this paper, we highlight the significance of studying the evolving nature of the Web personalization. Web usage mining is used to discover interesting user navigation patterns and can be applied to many real-world problems, such as improving Web sites/pages, making additional topic or product recommendations, user/customer behavior studies, etc. The proposed framework can be utilized in many recommendation tasks on the World Wide Web, including query suggestions, image recommendations, etc. The experimental analysis on large datasets shows the promising future of our work.

  13. Applying WebMining on KM system

    Science.gov (United States)

    Shimazu, Keiko; Ozaki, Tomonobu; Furukawa, Koichi

    KM (Knowledge Management) systems have recently been adopted within the realm of enterprise management. On the other hand, data mining technology is widely acknowledged within Information systems' R&D Divisions. Specially, acquisition of meaningful information from Web usage data has become one of the most exciting eras. In this paper, we employ a Web based KM system and propose a framework for applying Web Usage Mining technology to KM data. As it turns out, task duration varies according to different user operations such as referencing a table-of-contents page, down-loading a target file, and writing to a bulletin board. This in turn makes it possible to easily predict the purpose of the user's task. By taking these observations into account, we segmented access log data manually. These results were compared with results abstained by applying the constant interval method. Next, we obtained a segmentation rule of Web access logs by applying a machine-learning algorithm to manually segmented access logs as training data. Then, the newly obtained segmentation rule was compared with other known methods including the time interval method by evaluating their segmentation results in terms of recall and precision rates and it was shown that our rule attained the best results in both measures. Furthermore, the segmented data were fed to an association rule miner and the obtained association rules were utilized to modify the Web structure.

  14. SEMANTIC WEB MINING: ISSUES AND CHALLENGES

    OpenAIRE

    Karan Singh*, Anil kumar, Arun Kumar Yadav

    2016-01-01

    The combination of the two fast evolving scientific research areas “Semantic Web” and “Web Mining” are well-known as “Semantic Web Mining” in computer science. These two areas cover way for the mining of related and meaningful information from the web, by this means giving growth to the term “Semantic Web Mining”. The “Semantic Web” makes mining easy and “Web Mining” can construct new structure of Web. Web Mining applies ...

  15. Efficient Web Log Mining using Doubly Linked Tree

    CERN Document Server

    Jain, Ratnesh Kumar; Jain, Dr Suresh

    2009-01-01

    World Wide Web is a huge data repository and is growing with the explosive rate of about 1 million pages a day. As the information available on World Wide Web is growing the usage of the web sites is also growing. Web log records each access of the web page and number of entries in the web logs is increasing rapidly. These web logs, when mined properly can provide useful information for decision-making. The designer of the web site, analyst and management executives are interested in extracting this hidden information from web logs for decision making. Web access pattern, which is the frequently used sequence of accesses, is one of the important information that can be mined from the web logs. This information can be used to gather business intelligence to improve sales and advertisement, personalization for a user, to analyze system performance and to improve the web site organization. There exist many techniques to mine access patterns from the web logs. This paper describes the powerful algorithm that mine...

  16. Web-based pathology practice examination usage

    Directory of Open Access Journals (Sweden)

    Edward C Klatt

    2014-01-01

    Full Text Available Context: General and subject specific practice examinations for students in health sciences studying pathology were placed onto a free public internet web site entitled web path and were accessed four clicks from the home web site menu. Subjects and Methods: Multiple choice questions were coded into. html files with JavaScript functions for web browser viewing in a timed format. A Perl programming language script with common gateway interface for web page forms scored examinations and placed results into a log file on an internet computer server. The four general review examinations of 30 questions each could be completed in up to 30 min. The 17 subject specific examinations of 10 questions each with accompanying images could be completed in up to 15 min each. The results of scores and user educational field of study from log files were compiled from June 2006 to January 2014. Results: The four general review examinations had 31,639 accesses with completion of all questions, for a completion rate of 54% and average score of 75%. A score of 100% was achieved by 7% of users, ≥90% by 21%, and ≥50% score by 95% of users. In top to bottom web page menu order, review examination usage was 44%, 24%, 17%, and 15% of all accessions. The 17 subject specific examinations had 103,028 completions, with completion rate 73% and average score 74%. Scoring at 100% was 20% overall, ≥90% by 37%, and ≥50% score by 90% of users. The first three menu items on the web page accounted for 12.6%, 10.0%, and 8.2% of all completions, and the bottom three accounted for no more than 2.2% each. Conclusions: Completion rates were higher for shorter 10 questions subject examinations. Users identifying themselves as MD/DO scored higher than other users, averaging 75%. Usage was higher for examinations at the top of the web page menu. Scores achieved suggest that a cohort of serious users fully completing the examinations had sufficient preparation to use them to support

  17. Earth Science Mining Web Services

    Science.gov (United States)

    Pham, Long; Lynnes, Christopher; Hegde, Mahabaleshwa; Graves, Sara; Ramachandran, Rahul; Maskey, Manil; Keiser, Ken

    2008-01-01

    To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have developed a system to mine data at the source without the need of network transfers. The system has been constructed by linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements (S4PM), a processing engine at he GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow composer; and the EOS Clearinghouse (ECHO). XBaya is used to construct an analysis workflow at UAH using ADam components, which are also installed remotely at the GES DISC, wrapped as Web Services. The S4PM processing engine searches ECHO for data using space-time criteria, staging them to cache, allowing the ActiveBPEL engine to remotely orchestras the processing workflow within S4PM. As mining is completed, the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, while mining data at the data source using the server's resources rather than transferring the full volume over the internet. These diverse technologies have been infused into a functioning, distributed system with only minor changes to the underlying technologies. The key to the infusion is the loosely coupled, Web-Services based architecture: All of the participating components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services.

  18. Earth Science Mining Web Services

    Science.gov (United States)

    Pham, L. B.; Lynnes, C. S.; Hegde, M.; Graves, S.; Ramachandran, R.; Maskey, M.; Keiser, K.

    2008-12-01

    To allow scientists further capabilities in the area of data mining and web services, the Goddard Earth Sciences Data and Information Services Center (GES DISC) and researchers at the University of Alabama in Huntsville (UAH) have developed a system to mine data at the source without the need of network transfers. The system has been constructed by linking together several pre-existing technologies: the Simple Scalable Script-based Science Processor for Measurements (S4PM), a processing engine at the GES DISC; the Algorithm Development and Mining (ADaM) system, a data mining toolkit from UAH that can be configured in a variety of ways to create customized mining processes; ActiveBPEL, a workflow execution engine based on BPEL (Business Process Execution Language); XBaya, a graphical workflow composer; and the EOS Clearinghouse (ECHO). XBaya is used to construct an analysis workflow at UAH using ADaM components, which are also installed remotely at the GES DISC, wrapped as Web Services. The S4PM processing engine searches ECHO for data using space-time criteria, staging them to cache, allowing the ActiveBPEL engine to remotely orchestrates the processing workflow within S4PM. As mining is completed, the output is placed in an FTP holding area for the end user. The goals are to give users control over the data they want to process, while mining data at the data source using the server's resources rather than transferring the full volume over the internet. These diverse technologies have been infused into a functioning, distributed system with only minor changes to the underlying technologies. The key to this infusion is the loosely coupled, Web- Services based architecture: All of the participating components are accessible (one way or another) through (Simple Object Access Protocol) SOAP-based Web Services.

  19. Implementation of E-Service Intelligence in the Field of Web Mining

    Directory of Open Access Journals (Sweden)

    PROF. MS. S. P. SHINDE,

    2011-05-01

    Full Text Available The World Wide Web is a popular and interactive medium to disseminate information today .The web is huge, diverse, dynamic, widely distributed global information service centre. We are familiar with the terms like e-commerce, e-governance, e-market, e-finance, e-learning, e-banking etc. These terms come under online services called e-service applications. E-services involve various types of delivery systems, advanced information technologies, methodologies and applications of online services. The keyword intelligence will be the next paradigm shift in the e-services, thanks to internet technological advances. Intelligence is closely related with Artificial Intelligence. Web Mining is the technique used to crawlthrough various web resources to collect required information, which enables an individual to promote business, understanding marketing dynamics, and new promotions floating on the Internet etc. Thetaxonomy of web mining can be broadly divided into three distinct categories; according to the kinds of data to be mined they are Web Content Mining, Web Structure Mining and Web Usage Mining. Thereare a lot of techniques of web mining however, artificial intelligence techniques and algorithms are being used by almost all web mining tasks for their efficiency. This paper discusses the two main AI techniques; the Multi-Agent Systems and Swarm Intelligence, with some of their applications in web mining. Web Mining Intelligent techniques can be combined with traditional web mining approaches to improve the quality of mining.

  20. An Intuitive Approach for Web Scale Mining using W-Miner for Web Personalization

    Directory of Open Access Journals (Sweden)

    R.Lokeshkumar

    2014-08-01

    Full Text Available Web usage mining performs mining on web usage data or web logs. It is now possible to perform data mining on web log records collected from the web page history. A web log is a listing of page reference data/click stream data. The behavior of the web page readers is imprinted in the web server log files. By looking at the sequence of pages a user accesses, a user profile could be developed thus aiding in personalization. With personalization, web access or the contents of web page are modified to better fit the desires of the user and also to identify the browsing behavior of the user can improve system performance, enhance the quality and delivery of Internet Information services to the end user and identify the population of potential customers. With clustering, the desires are determined based on similarities. In this study, a Fuzzy clustering algorithm is designed and implemented. For the proposed algorithm, meaningful behavior patterns are extracted by applying efficient Fuzzy clustering algorithm, to log data. It is proved that performance of the proposed system is better than that of the existing best algorithm. The proposed Fuzzy clustering w-miner algorithm can provide popular information to web page visitors.

  1. Efficient Preprocessing technique using Web log mining

    Science.gov (United States)

    Raiyani, Sheetal A.; jain, Shailendra

    2012-11-01

    Web Usage Mining can be described as the discovery and Analysis of user access pattern through mining of log files and associated data from a particular websites. No. of visitors interact daily with web sites around the world. enormous amount of data are being generated and these information could be very prize to the company in the field of accepting Customerís behaviors. In this paper a complete preprocessing style having data cleaning, user and session Identification activities to improve the quality of data. Efficient preprocessing technique one of the User Identification which is key issue in preprocessing technique phase is to identify the Unique web users. Traditional User Identification is based on the site structure, being supported by using some heuristic rules, for use of this reduced the efficiency of user identification solve this difficulty we introduced proposed Technique DUI (Distinct User Identification) based on IP address ,Agent and Session time ,Referred pages on desired session time. Which can be used in counter terrorism, fraud detection and detection of unusual access of secure data, as well as through detection of regular access behavior of users improve the overall designing and performance of upcoming access of preprocessing results.

  2. Fuzzification of Web Objects: A Semantic Web Mining Approach

    Directory of Open Access Journals (Sweden)

    Tasawar Hussain

    2012-03-01

    Full Text Available Web Mining is becoming essential to support the web administrators and web users in multi-ways such as information retrieval; website performance management; web personalization; web marketing and website designing. Due to uncontrolled exponential growth in web data, knowledge base retrieval has become a very challenging task. The one viable solution to the problem is the merging of conventional web mining with semantic web technologies. This merging process will be more beneficial to web users by reducing the search space and by providing information that is more relevant. Key web objects play significant role in this process. The extraction of key web objects from a website is a challenging task. In this paper, we have proposed a framework, which extracts the key web objects from web log file and apply a semantic web to mine actionable intelligence. This proposed framework can be applied to non-semantic web for the extraction of key web objects. We also have defined an objective function to calculate key web object from users perspective. We named this function as key web object function. KWO function helps to fuzzify the extracted key web objects into three categories as Most Interested, Interested, and Least Interested. Fuzzification of web objects helps us to accommodate the uncertainty among the web objects of being user attractive. We also have validated the proposed scheme with the help of a case study.

  3. Web Usage, Advertising, and Shopping: Relationship Patterns.

    Science.gov (United States)

    Korgaonkar, Pradeep; Wolin, Lori D.

    2002-01-01

    Discusses Web sales and explores the differences between heavy, medium, and light Web users in terms of their beliefs about Web advertising, attitudes toward Web advertising, purchasing patterns, and demographics. Suggests marketers need to target Web advertising to particular Web users. (Author/LRW)

  4. Usage reporting on recorded lectures using educational data mining

    NARCIS (Netherlands)

    Gorissen, Pierre; Van Bruggen, Jan; Jochems, Wim

    2012-01-01

    Gorissen, P., Van Bruggen, J., & Jochems, W. M. G. (2012). Usage reporting on recorded lectures using educational data mining. International Journal of Learning Technology, 7, 23-40. doi:10.1504/IJLT.2012.046864

  5. Performance Based Novel Techniques for Semantic Web Mining

    Directory of Open Access Journals (Sweden)

    Mahendra Thakur

    2012-01-01

    Full Text Available The explosive growth in the size and use of the World Wide Web continuously creates new great challenges and needs. The need for predicting the users preferences in order to expedite and improve the browsing though a site can be achieved through personalizing of the websites. Most of the research efforts in web personalization correspond to the evolution of extensive research in web usage mining, i.e. the exploitation of the navigational patterns of the web site visitors. When a personalization system relies solely on usage-based results, however, valuable information conceptually related to what is finally recommended may be missed. Moreover, the structural properties of the web site are often disregarded. In this paper, we propose novel techniques that use the content semantics and the structural properties of a web site in order to improve the effectiveness of web personalization. In the first part of our work we present standing for Semantic Web Personalization, a personalization system that integrates usage data with content semantics, expressed in ontology terms, in order to compute semantically enhanced navigational patterns and effectively generate useful recommendations. To the best of our knowledge, our proposed technique is the only semantic web personalization system that may be used by non-semantic web sites. In the second part of our work, we present a novel approach for enhancing the quality of recommendations based on the underlying structure of a web site. We introduce UPR (Usage-based PageRank, a PageRank-style algorithm that relies on the recorded usage data and link analysis techniques. Overall, we demonstrate that our proposed hybrid personalization framework results in more objective and representative predictions than existing techniques.

  6. DAMEWARE - Data Mining & Exploration Web Application Resource

    CERN Document Server

    Brescia, Massimo; Esposito, Francesco; Fiore, Michelangelo; Garofalo, Mauro; Guglielmo, Magda; Longo, Giuseppe; Manna, Francesco; Nocella, Alfonso; Vellucci, Civita

    2016-01-01

    Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. DAMEWARE (DAta Mining & Exploration Web Application and REsource) is a general purpose, Web-based, Virtual Observatory compliant, distributed data mining framework specialized in massive data sets exploration with machine learning methods. We present the DAMEWARE (DAta Mining & Exploration Web Application REsource) which allows the scientific community to perform data mining and exploratory experiments on massive data sets, by using a simple web browser. DAMEWARE offers several tools which can be seen as working environments where to choose data analysis functionalities such as clustering, classification, regression, feature extraction etc., together with models and algorithms.

  7. Evaluation Method of Web Site Based on Web Structure Mining

    Institute of Scientific and Technical Information of China (English)

    LiJun-e; ZhouDong-ru

    2003-01-01

    The structure of Web site became more complex than before. During the design period of a Web site, the lack of model and method results in improper Web structure,which depend on the designer's experience. From the point of view of software engineering, every period in the software life must be evaluated before starting the next period's work. It is very important and essential to search relevant methods for evaluating Web structure before the site is completed. In this work, after studying the related work about the Web structure mining and analyzing the major structure mining methods (Page-rank and Hub/Authority), a method based on the Page-rank for Web structure evaluation in design stage is proposed. A Web structure modeling language WSML is designed, and the implement strategies for evaluating system of the Web site structure are given out. Web structure mining has being used mainly in search engines before. It is the first time to employ the Web structure mining technology to evaluate a Web structure in the design period of a Web site. It contributes to the formalization of the design documents for Web site and the improving of software engineering for large scale Web site, and the evaluating system is a practical tool for Web site construction.

  8. A Plausible Comprehensive Web Intelligent System for Investigation of Web User Behaviour Adaptable to Incremental Mining

    Directory of Open Access Journals (Sweden)

    V.V.R. Maheswara Rao

    2010-08-01

    Full Text Available With the continued increase in the usage of the World Wide Web (WWW Web mining has beenestablished as an important area of research. The WWW is a vast repository of unstructured information,in the form of interrelated files, distributed on numerous web servers over wide geographical regions.Web mining deals with the discovering and analyzing of useful information from the WWW. Web usagemining focuses on investigating the potential knowledge from the browsing patterns of users and to findthe correlation between the pages on analysis. To proceed towards web intelligence, obviating the needfor human interaction, need to incorporate and embed artificial intelligence into web tools. Beforeapplying mining techniques, the data in the web log has to be pre-processed, integrated and transformed.The data pre-processing stage is the most important phase in the process of web mining and is criticaland complex in successful extraction of useful data. The web log is non scalable, impractical anddistributed in nature thus conventional data pre-processing techniques are proved to be not suitable asthey assume that the data is static. Hence intelligent system is required for capable of pre processingweblog efficiently. Due to the incremental nature of the web log, it is necessary for web miners to useincremental mining techniques to extract the usage patterns and study the visiting characteristics of user,hence one can require a comprehensive algorithm which reduces the computing cost significantly.This paper introduces an Intelligent System IPS for pre-processing of web log, in addition a learningalgorithm IFP-tree model is proposed for pattern recognition. The Intelligent Pre-processing System(IPS can differentiate human user and web search engine accesses intelligently in less time, and discardssearch engine accesses. The present system reduces the error rate and improves significant learningperformance of the algorithm. The Incremental Frequent Pattern Tree

  9. Journey from Data Mining to Web Mining to Big Data

    OpenAIRE

    Gupta, Richa

    2014-01-01

    This paper describes the journey of big data starting from data mining to web mining to big data. It discusses each of this method in brief and also provides their applications. It states the importance of mining big data today using fast and novel approaches.

  10. Bridging data mining and semantic web

    OpenAIRE

    Aman, Edris

    2016-01-01

    Nowadays Semantic Web is widely adopted standard of knowledge representation. Hence, knowledge engineers are applying sophisticated methods to capture, discover and represent knowledge in Semantic Web form. Studies show that, to represent knowledge in Semantic Web standard, data mining techniques such as Decision Trees, Association Rules, etc., play an important role. These techniques are implemented in publicly available Data Mining tools. These tools represent knowledge discovered in human ...

  11. Semantic Web Mining and its application in Human Resource Mgt

    OpenAIRE

    Radhika Malik; Udayan Ghose

    2011-01-01

    The Semantic Web is a project and vision of the World WideWeb Consortium to extend the current Web, so that informationis given a well-defined meaning and structure, enhancingcomputers and people to work in cooperation. Semantic webmining is the combination of web mining and semantic web. Theknowledge of semantic web makes web mining easier to achieveand can also improve the effectiveness of web mining. Semanticweb mining technologies are being added to enterprise solutionsto accommodate new ...

  12. Semantic web mining and the representation, analysis, and evolution of web space

    OpenAIRE

    Berendt, Bettina; Hotho, Andreas; Stumme, Gerd

    2005-01-01

    Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this paper is to...

  13. Analysis of Web Logs And Web User In Web Mining

    Directory of Open Access Journals (Sweden)

    L.K. Joshila Grace

    2011-01-01

    Full Text Available Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. It also provides the idea of creating an extended log file and learning the user behaviour.

  14. Analysis of Web Logs and Web User in Web Mining

    CERN Document Server

    Grace, L K Joshila; Nagamalai, Dhinaharan

    2011-01-01

    Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. It also provides the idea of creating an extended log file and learning the user behaviour.

  15. Privacy for Semantic Web Mining using Advanced DSA – Spatial LBS Case Study in mining

    Directory of Open Access Journals (Sweden)

    S.Nagaprasad Sri

    2010-09-01

    Full Text Available The Web Services paradigm promises to enable rich flexible and dynamic interoperation of highly distributed, heterogeneous network enabled services. The idea of Web Services Mining that it makes use of the findings in the field of data mining and applies them to the world of Web Services. The emerging concept of Semantic Web Services aims at more sophisticated Web Services technologies: on basis of Semantic Description Frameworks, Intelligent mechanisms are envisioned for Discovery, Composition, and contracting of Web Services. The aim of semantic web is not only to support to access information on the web but also to support its usage. Geospatial Semantic Web is an augmentation to the Semantic Web that adds geospatial abstractions, as well as related reasoning, representation and query mechanisms. Web Service Security represents a key requirement for today’s distributed interconnected digital world and for the new generations, Web 2.0 and Semantic Web. To date, the problem of security has been investigated very much in the context of standardization efforts; Personal judgments are made usually based on the sensitivity of the information and the reputation of the party to which the information is to be disclosed. On the privacy front, this means that privacy invasion would net more quality and sensitive personal information. In this paper, we had implemented a case study on integrated privacy issues of Spatial Semantic Web Services Mining. Initially we improved privacy of Geospatial Semantic Layer. Finally, we implemented a Location Based System and improved its digital signature capability, using advanced Digital Signature standards.

  16. [WEB-based medical data mining integration].

    Science.gov (United States)

    Yao, Gang; Zhang, Xiaoxiang; Wang, Huoming

    2014-06-01

    An integration of medical data management system based on WEB and data mining tool is reportedly in this paper. In the application process of this system, web-based medical data mining user sends requests to the server by using client browser with http protocol, the commands are then received by the server and the server calls the data mining tools remote object for data processing, and the results are sent back to the customer browser through the http protocol and presented to the user. In order to prove the feasibility of the proposed solution, the test is done under the NET platform by using SAS and SPSS, and the detail steps are given. By the practical test, it was proved that the web-based data mining tool integration solutions proposed in this paper would have its broad prospects for development, which would open up a new route to the development of medical data mining.

  17. Automatic generation of Web mining environments

    Science.gov (United States)

    Cibelli, Maurizio; Costagliola, Gennaro

    1999-02-01

    The main problem related to the retrieval of information from the world wide web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a web mining environment (WME), which is capable of finding, extracting and structuring information related to a particular domain from web documents, using general purpose indices. The WME architecture includes a web engine filter (WEF), to sort and reduce the answer set returned by a web engine, a data source pre-processor (DSP), which processes html layout cues in order to collect and qualify page segments, and a heuristic-based information extraction system (HIES), to finally retrieve the required data. Furthermore, we present a web mining environment generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.

  18. Semantic Web Mining and its application in Human Resource Mgt

    Directory of Open Access Journals (Sweden)

    Radhika Malik

    2011-08-01

    Full Text Available The Semantic Web is a project and vision of the World WideWeb Consortium to extend the current Web, so that informationis given a well-defined meaning and structure, enhancingcomputers and people to work in cooperation. Semantic webmining is the combination of web mining and semantic web. Theknowledge of semantic web makes web mining easier to achieveand can also improve the effectiveness of web mining. Semanticweb mining technologies are being added to enterprise solutionsto accommodate new techniques for discovering relationshipsacross different database, business applications and Webservices. Since this is an interdisciplinary concept in bothengineering and management; we first review web mining,semantic web, semantic web mining and finally propose anapplication of semantic web mining in human resourcemanagement.

  19. Use of web mining in studying innovation.

    Science.gov (United States)

    Gök, Abdullah; Waterworth, Alec; Shapira, Philip

    As enterprises expand and post increasing information about their business activities on their websites, website data promises to be a valuable source for investigating innovation. This article examines the practicalities and effectiveness of web mining as a research method for innovation studies. We use web mining to explore the R&D activities of 296 UK-based green goods small and mid-size enterprises. We find that website data offers additional insights when compared with other traditional unobtrusive research methods, such as patent and publication analysis. We examine the strengths and limitations of enterprise innovation web mining in terms of a wide range of data quality dimensions, including accuracy, completeness, currency, quantity, flexibility and accessibility. We observe that far more companies in our sample report undertaking R&D activities on their web sites than would be suggested by looking only at conventional data sources. While traditional methods offer information about the early phases of R&D and invention through publications and patents, web mining offers insights that are more downstream in the innovation process. Handling website data is not as easy as alternative data sources, and care needs to be taken in executing search strategies. Website information is also self-reported and companies may vary in their motivations for posting (or not posting) information about their activities on websites. Nonetheless, we find that web mining is a significant and useful complement to current methods, as well as offering novel insights not easily obtained from other unobtrusive sources.

  20. Improving query services of web map by web mining

    Science.gov (United States)

    Huang, Maojun

    2007-11-01

    Web map is the hybrid of map and the World Wide Web (known as Web). It is usually created with WebGIS techniques. With the rapid social development, web maps oriented the public are facing pressure that dissatisfy the increased demanding. The geocoding database plays a key role in supporting query services effectively. The traditional geocoding method is laborious and time-consuming. And there is much online spatial information, which would be the supplementary information source for geocoding. Therefore, this paper discusses how to improve query services by web mining. The improvement can be described from three facets: first, improving location query by discovering and extracting address information from the Web to extend geocoding database. Second, enhancing the ability of optimum path query of public traffic and buffer query by spatial analyzing and reasoning on the extended geocoding database. Third, adjusting strategies of collecting data according to patterns discovered by web map query mining. Finally, this paper presents the designing of the application system and experimental results.

  1. Graph Mining Meets the Semantic Web

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sangkeun (Matt) [ORNL; Sukumar, Sreenivas R [ORNL; Lim, Seung-Hwan [ORNL

    2015-01-01

    The Resource Description Framework (RDF) and SPARQL Protocol and RDF Query Language (SPARQL) were introduced about a decade ago to enable flexible schema-free data interchange on the Semantic Web. Today, data scientists use the framework as a scalable graph representation for integrating, querying, exploring and analyzing data sets hosted at different sources. With increasing adoption, the need for graph mining capabilities for the Semantic Web has emerged. We address that need through implementation of three popular iterative Graph Mining algorithms (Triangle count, Connected component analysis, and PageRank). We implement these algorithms as SPARQL queries, wrapped within Python scripts. We evaluate the performance of our implementation on 6 real world data sets and show graph mining algorithms (that have a linear-algebra formulation) can indeed be unleashed on data represented as RDF graphs using the SPARQL query interface.

  2. Web Mining in Soft Computing Relevance and Future Directions

    Directory of Open Access Journals (Sweden)

    Amandeep Kour

    2013-01-01

    Full Text Available This paper summarizes the different characteristics of web data, the basic components of web mining and its different types. Web mining combines two of the activated research areas: Data Mining and World Wide Web. The Web mining research relates to several researches communities such as Database, Knowledge Discovery, Information Retrieval and Artificial Intelligence. The limitations of some of the existing web mining and knowledge discovery methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL, artificial neural networks (ANNs, genetic algorithms (GAs, and rough sets (RSs highlighted. A survey of the existing literature on “soft web mining” is provided along with the commercially available systems. The prospective areas of web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing “soft web mining” systems is explained. An extensive bibliography is also provided

  3. ANALYSIS OF WEB MINING APPLICATIONS AND BENEFICIAL AREAS

    Directory of Open Access Journals (Sweden)

    Khaleel Ahmad

    2011-10-01

    Full Text Available The main purpose of this paper is to study the process of Web mining techniques, features, application ( e-commerce and e-business and its beneficial areas. Web mining has become more popular and its widely used in varies application areas (such as business intelligent system, e-commerce and e-business. The e-commerce or e-business results are bettered by the application of the mining techniques such as data mining and text mining, among all the mining techniques web mining is better.

  4. A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification

    Directory of Open Access Journals (Sweden)

    J. Vellingiri

    2011-01-01

    Full Text Available Problem statement: In the internet era web sites on the internet are useful source of information for almost every activity. So there is a rapid development of World Wide Web in its volume of traffic and the size and complexity of web sites. Web mining is the application of data mining, artificial intelligence, chart technology and so on to the web data and traces user’s visiting behaviors and extracts their interests using patterns. Because of its direct application in e-commerce, Web analytics, e-learning, information retrieval, web mining has become one of the important areas in computer and information science. There are several techniques like web usage mining exists. But all processes its own disadvantages. This study focuses on providing techniques for better data cleaning and transaction identification from the web log. Approach: Log data is usually noisy and ambiguous and preprocessing is an important process for efficient mining process. In the preprocessing, the data cleaning process includes removal of records of graphics, videos and the format information, the records with the failed HTTP status code and robots cleaning. Sessions are reconstructed and paths are completed by appending missing pages in preprocessing. And also the transactions which depict the behavior of users are constructed accurately in preprocessing by calculating the Reference Lengths of user access by considering byte rate. Results: When the number of records is considered, for example, for 1000 record, only 350 records are resulted using data cleaning. When the execution time is considered, the initial log take s119 seconds for execution, whereas, only 52 seconds are required by proposed technique. Conclusion: The experimental results show the performance of the proposed algorithm and comparatively it gives the good results for web usage mining compared to existing approaches.

  5. A Two-Tiered Model for Analyzing Library Web Site Usage Statistics, Part 1: Web Server Logs.

    Science.gov (United States)

    Cohen, Laura B.

    2003-01-01

    Proposes a two-tiered model for analyzing web site usage statistics for academic libraries: one tier for library administrators that analyzes measures indicating library use, and a second tier for web site managers that analyzes measures aiding in server maintenance and site design. Discusses the technology of web site usage statistics, and…

  6. Visual Based Retrieval Systems and Web Mining--Introduction.

    Science.gov (United States)

    Iyengar, S. S.

    2001-01-01

    Briefly discusses Web mining and image retrieval techniques, and then presents a summary of articles in this special issue. Articles focus on Web content mining, artificial neural networks as tools for image retrieval, content-based image retrieval systems, and personalizing the Web browsing experience using media agents. (AEF)

  7. Web挖掘研究%RESEARCH ON WEB MINING: A SURVEY

    Institute of Scientific and Technical Information of China (English)

    韩家炜; 孟小峰; 王静; 李盛恩

    2001-01-01

    因特网目前是一个巨大、分布广泛、全球性的信息服务中心,它涉及新闻、广告、消费信息、金融管理、教育、政府、电子商务和许多其它信息服务.Web包含了丰富和动态的超链接信息,以及Web页面的访问和使用信息,这为数据挖掘提供了丰富的资源.Web挖掘就是从Web文档和Web活动中抽取感兴趣的潜在的有用模式和隐藏的信息.对Web挖掘最新技术及发展方向做了全面分析,包括Web结构挖掘、多层次Web数据仓库方法以及Web Log挖掘等.%The World Wide Web serves as huge, widely distributed, global information service center for various applications. Web contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining. The goal of Web mining is to discover the access pattern and hidden information from the huge collection of documents plus hyperlink information, access and usage information. Given in this paper is an overview of Web mining techniques and new trends, mainly involving Web structure mining, a multilayered Web information base building, and Web Log mining.

  8. The design and implementation of web mining in web sites security

    Institute of Scientific and Technical Information of China (English)

    ZHANG Guo-yin; GU Guo-chang; LI Jian-li

    2003-01-01

    The backdoor or information leak of Web servers can be detected by using Web Mining techniques on some abnormal Web log and Web application log data. The security of Web servers can be enhanced and the damage of illegal access can be avoided. Firstly, the system for discovering the patterns of information leakages in CGI scripts from Web log data was proposed. Secondly, those patterns for system administrators to modify their codes and enhance their Web site security were provided. The following aspects were described: one is to combine web application log with web log to extract more information,so web data mining could be used to mine web log for discovering the information that firewall and Information Detection System cannot find. Another approach is to propose an operation module of web site to enhance Web site security. In cluster server session, Density-Based Clustering technique is used to reduce resource cost and obtain better efficiency.

  9. Mining Sequential Access Pattern with Low Support From Large Pre-Processed Web Logs

    Directory of Open Access Journals (Sweden)

    S. Vijayalakshmi

    2010-01-01

    Full Text Available Problem statement: To find frequently occurring Sequential patterns from web log file on the basis of minimum support provided. We introduced an efficient strategy for discovering Web usage mining is the application of sequential pattern mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Approach: The approaches adopt a divide-and conquer pattern-growth principle. Our proposed method combined tree projection and prefix growth features from pattern-growth category with position coded feature from early-pruning category, all of these features are key characteristics of their respective categories, so we consider our proposed method as a pattern growth, early-pruning hybrid algorithm. Results: Our proposed Hybrid algorithm eliminated the need to store numerous intermediate WAP trees during mining. Since only the original tree was stored, it drastically cuts off huge memory access costs, which may include disk I/O cost in a virtual memory environment, especially when mining very long sequences with millions of records. Conclusion: An attempt had been made to our approach for improving efficiency. Our proposed method totally eliminates reconstructions of intermediate WAP-trees during mining and considerably reduces execution time.

  10. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    OpenAIRE

    J. Sharmila; Subramani, A.

    2016-01-01

    Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodolog...

  11. Web Mining Using PageRank Algorithm

    Directory of Open Access Journals (Sweden)

    Vignesh. V

    2013-11-01

    Full Text Available Data mining is extracting and automatic discovering the web based information has been used as web mining. It is one of the most universal and a dominant application on the Internet and it becomes increasing in size and search tools that combine the results of multiple search engines are becoming more valuable. But, almost none of these studies deals with genetic relation algorithm (GRA, where GRA is one of the evolutionary methods with graph structure. GRA was designed to both increase the effectiveness of search engine and improve their efficiency. GRA considers the correlation coefficient between stock brands as strength, which indicates the relation between nodes in each individual of GRA. The reduced number of hyperlinks provided by GRA in the final generation consists of only the most similar hyperlinks with respect to the query. But, the end user’s not satisfied fully. To improve the satisfaction of user by using Page rank algorithm to measure the importance of a page and to prioritize pages returned from a GRA. It will reduce the user’s searching time. PageRank algorithm works to allocate rank for filtered links based on number of keyword occurred in the content.

  12. An Intelligent Optimal Genetic Model to Investigate the User Usage Behaviour on World Wide Web

    Directory of Open Access Journals (Sweden)

    V.V.R. Maheswara Rao

    2013-04-01

    Full Text Available The unexpected wide spread use of WWW and dynamically increasing nature of the web creates new challenges in the web mining since the data in the web inherently unlabelled, incomplete, non linear, and heterogeneous. The investigation of user usage behaviour on WWW is real time problem which involves multiple conflicting measures of performance. These measures make not only computational intensive butalso needs to the possibility of be unable to find the exact solution. Unfortunately, the conventional methods are limited to optimization problems due to the absence of semantic certainty and presence of human intervention. In handling such data and overcome the limitations of conventional methodologies it is necessary to use a soft computing model that can work intelligently to attain optimal solution. To achieve the optimized solution for investigating the web user usage behaviour, the authors in the present paper proposes an Intelligent Optimal Genetic Model, IOGM, which is designed as an optimization tool based on the concept of natural genetic systems. Initially, IOGM comprise a set of individual solutions or chromosomes called the initial population. Later, biologically inspired operators create a new and potentially better population. Finally, by the theory of evolution, survive only optimal individuals from the population and then generate the next biological population. This process is terminated as when an acceptable optimal set of visited patterns is found or after fixed time limit. Additionally, IOGM strengthen by its ability to estimate the optimal stopping time of process. The proposed soft computing model ensures the identifiable features like learning, adaptability, self-maintenance and self-improvement. To validate the proposed system, several experiments were conducted and results proven this are claimed in this paper

  13. AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ON WORLD WIDE WEB

    Directory of Open Access Journals (Sweden)

    V.V.R. Maheswara Rao

    2013-03-01

    Full Text Available The unexpected wide spread use of WWW and dynamically increasing nature of the web creates new challenges in the web mining since the data in the web inherently unlabelled, incomplete, non linear, and heterogeneous. The investigation of user usage behaviour on WWW is real time problem which involves multiple conflicting measures of performance. These measures make not only computational intensive but also needs to the possibility of be unable to find the exact solution. Unfortunately, the conventional methods are limited to optimization problems due to the absence of semantic certainty and presence of human intervention. In handling such data and overcome the limitations of conventional methodologies it is necessary to use a soft computing model that can work intelligently to attain optimal solution. To achieve the optimized solution for investigating the web user usage behaviour, the authors in the present paper proposes an Intelligent Optimal Genetic Model, IOGM, which is designed as an optimization tool based on the concept of natural genetic systems. Initially, IOGM comprise a set of individual solutions or chromosomes called the initial population. Later, biologically inspired operators create a new and potentially better population. Finally, by the theory of evolution, survive only optimal individuals from the population and then generate the next biological population. This process is terminated as when an acceptable optimal set of visited patterns is found or after fixed time limit. Additionally, IOGM strengthen by its ability to estimate the optimal stopping time of process. The proposed soft computing model ensures the identifiable features like learning, adaptability, self-maintenance and self-improvement. To validate the proposed system, several experiments were conducted and results proven this are claimed in this paper.

  14. A Review on Semantic-Based Web Mining and its Applications

    OpenAIRE

    Sivakumar J; Ravichandran K.S

    2013-01-01

    In this paper we survey the Semantic-based Web mining is a combination of two fast developing domains Semantic Web and Web mining. These two fields address the current challenges of the WorldWide Web (WWW). The idea is to improve the results of Web Mining by making use of the new semantic structure of the Web and to make use of Web Mining for creating the Semantic Web. The Semantic Web can make mining of the Web much easier because of the availability of background knowledge and Web Mining ca...

  15. Privacy for Semantic Web Mining using Advanced DSA – Spatial LBS Case Study

    Directory of Open Access Journals (Sweden)

    Dr.D.Sravan Kumar,

    2010-05-01

    Full Text Available The Web Services paradigm promises to enable rich flexible and dynamic interoperation of highly distributed, heterogeneous network enabled services. The idea of Web Services Mining that it makes use of the findings in the field of data mining and applies them to the world of Web Services. The emerging concept of Semantic Web Services aims at more sophisticated Web Services technologies: on basis of Semantic Description Frameworks, Intelligent mechanisms are envisioned for Discovery, Composition, and contracting of Web Services. The aim of semantic web is not only to support to access information on the web but also to support its usage. Geospatial Semantic Web is an augmentation to the Semantic Web that adds geospatial abstractions, as well as related reasoning, representation and query mechanisms. Web Service Security represents a key requirement for today’s distributed interconnected digital world and for the new generations, Web 2.0 and Semantic Web. To date, the problem of security has been investigated very much in the context of standardization efforts; Personal judgments are made usually based on the sensitivity of the information and the reputation of the party towhich the information is to be disclosed. On the privacy front,this means that privacy invasion would net more quality and sensitive personal information. In this paper, we had implemented a case study on integrated privacy issues of Spatial Semantic Web Services Mining. Initially we improved privacy of Geospatial Semantic Layer. Finally, we implemented a Location Based System and improved its digital signature capability, using advanced Digital Signature standards.

  16. Text mining of web-based medical content

    CERN Document Server

    Neustein, Amy

    2014-01-01

    Text Mining of Web-Based Medical Content examines web mining for extracting useful information that can be used for treating and monitoring the healthcare of patients. This work provides methodological approaches to designing mapping tools that exploit data found in social media postings. Specific linguistic features of medical postings are analyzed vis-a-vis available data extraction tools for culling useful information.

  17. Antecedents of Continued Usage Intentions of Web-Based Learning Management System in Tanzania

    Science.gov (United States)

    Lwoga, Edda Tandi; Komba, Mercy

    2015-01-01

    Purpose: The purpose of this paper is to examine factors that predict students' continued usage intention of web-based learning management systems (LMS) in Tanzania, with a specific focus on the School of Business of Mzumbe University. Specifically, the study investigated major predictors of actual usage and continued usage intentions of…

  18. Antecedents of Continued Usage Intentions of Web-Based Learning Management System in Tanzania

    Science.gov (United States)

    Lwoga, Edda Tandi; Komba, Mercy

    2015-01-01

    Purpose: The purpose of this paper is to examine factors that predict students' continued usage intention of web-based learning management systems (LMS) in Tanzania, with a specific focus on the School of Business of Mzumbe University. Specifically, the study investigated major predictors of actual usage and continued usage intentions of…

  19. Mining Interesting Knowledge from Web-Log

    Institute of Scientific and Technical Information of China (English)

    ZHOU Hong-fang; FENG Bo-qin; HEI Xin-hong; LU Lin-tao

    2004-01-01

    Web-log contains a lot of information related with user activities on the Internet.How to mine user browsing interest patterns effectively is an important and challengeable research topic.On the analysis of the present algorithm's advantages and disadvantages, we propose a new concept: support-interest.Its key insight is that visitor will backtrack if they do not find the information where they expect.And the point from where they backtrack is the expected location for the page.We present User Access Matrix and the corresponding algorithm for discovering such expected locations that can handle page caching by the browser.Since the URL-URL matrix is a sparse matrix which can be represented by List of 3-tuples, we can mine user preferred sub-paths from the computation of this matrix.Accordingly, all the sub-paths are merged, and user preferred paths are formed.Experiments showed that it was accurate and scalable.It's suitable for website based application, such as to optimize website's topological structure or to design personalized services.

  20. The Role of Virtual Reference in Library Web Site Design: A Qualitative Source for Usage Data

    Science.gov (United States)

    Powers, Amanda Clay; Shedd, Julie; Hill, Clay

    2011-01-01

    Gathering qualitative information about usage behavior of library Web sites is a time-consuming process requiring the active participation of patron communities. Libraries that collect virtual reference transcripts, however, hold valuable data regarding how the library Web site is used that could benefit Web designers. An analysis of virtual…

  1. Data mining and knowledge discovery resources for astronomy in the web 2.0 age

    Science.gov (United States)

    Cavuoti, S.; Brescia, M.; Longo, G.

    2012-09-01

    The emerging field of AstroInformatics, while on the one hand appears crucial to face the technological challenges, on the other is opening new exciting perspectives for new astronomical discoveries through the implementation of advanced data mining procedures. The complexity of astronomical data and the variety of scientific problems, however, call for innovative algorithms and methods as well as for an extreme usage of ICT technologies. The DAME (DAta Mining and Exploration) Program exposes a series of web-based services to perform scientific investigation on astronomical massive data sets. The engineering design and requirements, driving its development since the beginning of the project, are projected towards a new paradigm of Web based resources, which reflect the final goal to become a prototype of an efficient data mining framework in the data-centric era.

  2. Web Data Mining Technology%Web数据挖掘技术

    Institute of Scientific and Technical Information of China (English)

    刘先熙

    2009-01-01

    随着Internet/Web技术的快速普及和迅猛发展,各种信息可以以非常低的成本在网络上获得.如何在这些信息中找到用户真正需要的内容,成为数据组织和Web相关领域专家学者关注的焦点.web数据挖掘旨在发现隐藏在Web数据中潜在的有用知识、提供决策支持,已经成为数据挖掘领域中新兴的研究热点.该文主要从Web内容挖掘、web结构挖掘和Web使用挖掘三个方面阐述Web数据挖掘的基本知识.%With the fast popularization and developing rapidly of Intemet/Web technology, Various information can be got in the network with very low cost, How to find the content that users really need in the information, Become the focus that data organization and Web relevant domain experts and scholars. It aims at finding that hide the potential useful knowledge in Web data, offer decision support that web data mining, have already become the new developing research focus in the field. This paper mainly web content mining, web structure mining and web usage mining three respects from the basis of web data mining.

  3. Evaluation Method of Web Site Structure Based on Web Structure Mining

    Institute of Scientific and Technical Information of China (English)

    Li Jun-e; Zhou Dong-ru

    2003-01-01

    The structure of Web site hecarne more complex titan before. During the design period of a Web site, the lack of model and method results in improper Web structure,which depend on the designer's experience. From the point of view of software engineering, every period in the software life must be evaluated before starting the next period's work. It is very important and essential to search relevant methods for evaluating Web structure before the site is completed. In this work, after studying the related work about the Web struc lure mining and analyzing the major structure mining methods (Page-rank and Hub/Authority), a method based on the Page-rank for Web structure evaluation in design stage is proposecL A Web structure modeling language WSML is designed, and the implement strategies for evaluating system of the Web site structure are given out. Web structure mining has being used mainly in search engines before. It is the first time to employ the Web structure mining technology to evaluate a Web structure in the design period of a Web site. It contributes to the formalization of the design documents for Web site and the improving of software engineering for large scale Web site, and the evaluating system is a practical tool for Web site construction.

  4. Web Mining: Penning an Era of Information Age

    OpenAIRE

    Anshika Goel; Dinesh Sahu; Manish Kumar

    2014-01-01

    Today's age is rightly pronounced as "Information Age" which stands on the edifice of Information Technology and is operated by the Internet through the concept of web mining and is maintained & evolved through the high-speed technology of cloud computing. In short, if we try to summarize the situation, we would find that web mining concept has fuelled the entire process. This paper is an attempt to put light on the aspect of how web mining has penned the information age by co...

  5. DATA, TEXT, AND WEB MINING FOR BUSINESS INTELLIGENCE: A SURVEY

    Directory of Open Access Journals (Sweden)

    Abdul-Aziz Rashid

    2013-03-01

    Full Text Available The Information and Communication Technologies revolution brought a digital world with huge amounts of data available. Enterprises use mining technologies to search vast amounts of data for vital insight and knowledge. Mining tools such as data mining, text mining, and web mining are used to find hidden knowledge in large databases or the Internet. Mining tools are automated software tools used to achieve business intelligence by finding hidden relations, and predicting future events from vast amounts of data. This uncovered knowledge helps in gaining completive advantages, better customers’ relationships, and even fraud detection. In this survey, we’ll describe how these techniques work, how they are implemented. Furthermore, we shall discuss how business intelligence is achieved using these mining tools. Then look into some case studies of success stories using mining tools. Finally, we shall demonstrate some of the main challenges to the mining technologies that limit their potential.

  6. Data, Text and Web Mining for Business Intelligence : A Survey

    Directory of Open Access Journals (Sweden)

    Abdul-Aziz Rashid Al-Azmi

    2013-04-01

    Full Text Available The Information and Communication Technologies revolution brought a digital world with huge amountsof data available. Enterprises use mining technologies to search vast amounts of data for vital insight andknowledge. Mining tools such as data mining, text mining, and web mining are used to find hiddenknowledge in large databases or the Internet. Mining tools are automated software tools used to achievebusiness intelligence by finding hidden relations,and predicting future events from vast amounts of data.This uncovered knowledge helps in gaining completive advantages, better customers’ relationships, andeven fraud detection. In this survey, we’ll describe how these techniques work, how they are implemented.Furthermore, we shall discuss how business intelligence is achieved using these mining tools. Then lookinto some case studies of success stories using mining tools. Finally, we shall demonstrate some of the mainchallenges to the mining technologies that limit their potential.

  7. Web Video Mining: Metadata Predictive Analysis using Classification Techniques

    Directory of Open Access Journals (Sweden)

    Siddu P. Algur

    2016-02-01

    Full Text Available Now a days, the Data Engineering becoming emerging trend to discover knowledge from web audiovisual data such as- YouTube videos, Yahoo Screen, Face Book videos etc. Different categories of web video are being shared on such social websites and are being used by the billions of users all over the world. The uploaded web videos will have different kind of metadata as attribute information of the video data. The metadata attributes defines the contents and features/characteristics of the web videos conceptually. Hence, accomplishing web video mining by extracting features of web videos in terms of metadata is a challenging task. In this work, effective attempts are made to classify and predict the metadata features of web videos such as length of the web videos, number of comments of the web videos, ratings information and view counts of the web videos using data mining algorithms such as Decision tree J48 and navie Bayesian algorithms as a part of web video mining. The results of Decision tree J48 and navie Bayesian classification models are analyzed and compared as a step in the process of knowledge discovery from web videos.

  8. Parallel Web Mining System Based on Cloud Platform

    Institute of Scientific and Technical Information of China (English)

    Shengmei Luo; Qing He; Lixia Liu; Xiang Ao; Ning Li; Fuzhen Zhuang

    2012-01-01

    Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.

  9. Usage of Safety Gloves in the Gold Mining Industry

    CSIR Research Space (South Africa)

    Scheepers, JCE

    1978-10-01

    Full Text Available The safety departments of 31 mines were visited, and the data obtained was used to determine to what extent safety gloves were being used in the gold mining industry. The frequency of occurrence of hand injuries amongst black workers of the gold...

  10. Building user interfaces with Google Web Toolkit: usage properties analysis

    OpenAIRE

    Poklukar, Matej

    2012-01-01

    This Bachelor thesis provides an overview of different techniques used for web pages or web applications user interface development using Google Web Toolkit. During development process programmer and designer encounter the problem in which way should design be passed from designer to programmer. Designer can plot web page design easily on paper or in an appropriate program and sends it to the programmer. Or he can create design in more advanced form, as a multitude of different files. Prog...

  11. A Novel Approach for Web Page Set Mining

    CERN Document Server

    Geeta, R B; Totad, Shasikumar G; D, Prasad Reddy P V G

    2011-01-01

    The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL), the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine) provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional databa...

  12. Usage of Data Mining at Financial Decision Making

    Directory of Open Access Journals (Sweden)

    Levent BORAN

    2014-06-01

    Full Text Available The knowledge age requires controlling every kind of information. Recognition of patterns in data may provide previously unknown and useful information that can provide competitive advantages. If related techniques are applied on financial statements, it is possible to acquire valuable information about companies’ financial situations. It is considered that data mining could be an alternative of common financial analysis techniques such as vertical analysis, horizontal analysis, trend analysis and ratio analysis. Against existing financial analysis methods, data mining provides some advantages, which are ability of manipulation of huge data and competence of obtaining previously unknown information. There exist two major constraints of data mining implementation that are lack of experts on both data mining and related domains and cost of computer software and hardware used.

  13. Web Crime Mining by Means of Data Mining Techniques

    Directory of Open Access Journals (Sweden)

    Javad Hosseinkhani

    2014-03-01

    Full Text Available The purpose of this study is to provide a review to mining useful information by means of Data Mining. The procedure of extracting knowledge and information from large set of data is data mining that applying artificial intelligence method to find unseen relationships of data. There is more study on data mining applications that attracted more researcher attention and one of the crucial field is criminology that applying in data mining which is utilized for identifying crime characteristics. Detecting and exploring crimes and investigating their relationship with criminals are involved in the analyzing crime process. Criminology is a suitable field for using data mining techniques that shows the high volume and the complexity of relationships between crime datasets. Therefore, for further analysis development, the identifying crime characteristic will be the first step and obtained knowledge from data mining approaches is a very useful tool to help and support police forces. This research aims to provide a review to extract useful information by means of Data Mining, in order to find crime hot spots out and predict crime trends for them using crime data mining techniques.

  14. Discovering Student Web Usage Profiles Using Markov Chains

    Science.gov (United States)

    Marques, Alice; Belo, Orlando

    2011-01-01

    Nowadays, Web based platforms are quite common in any university, supporting a very diversified set of applications and services. Ranging from personal management to student evaluation processes, Web based platforms are doing a great job providing a very flexible way of working, promote student enrolment, and making access to academic information…

  15. Design and development of a web-enabled data mining system employing JEE technologies

    Indian Academy of Sciences (India)

    Varun Krishna; Jintomon Jose; N N R Ranga Suri

    2014-12-01

    With the advent of cost effective storage systems and high speed network connectivity, the amount of data gathered by various transactional systems has increased manifold and processing the same has become a real challenge. A number of data mining systems have been developed for processing such huge data in ameaningful way. Most of these systems are stand-alone in nature except a few of them that facilitate a subset of their overall functionality for web-based usage. This is mainly due to the challenges involved in designing the architectural framework required for developing a web-enabled data mining system. Such a system is aimed at analysing the transactional data pertaining to various input domains employing some advanced graph mining techniques like subgraph matching, frequent subgraph discovery and graph visualization. In this paper, we present an innovative approach employing various Java Enterprise Edition (JEE) based technologies for developing a data mining system that operates in the web environment. This paper basically presents the architectural principles required to design a suitable framework for enabling the development of such a system and the implementation challenges faced in realising the same. The paper also discusses functional details of the system involving various graphmining techniques. A few general guidelines based on our understanding during the system implementation are also included in this paper for completeness.

  16. Usage of data mining for analyzing customer mindset

    Directory of Open Access Journals (Sweden)

    Priti Sadaria

    2012-09-01

    Full Text Available As this is the era of Information Technology, no filed remains untouched by computer science. The technology has become an integral part of the business process. By implementing different data mining techniques and algorithms on the feedback collected from the customer, we can analyzed the data. With help of this analyzed information we have clear idea about the customer’s mind set and can take meaning full decision for production and marketing of particular product. To study about customer mindset differentmodels like classification and association models are used in data mining.

  17. Usage and design evaluation by family caregivers of a stroke intervention web site.

    Science.gov (United States)

    Pierce, Linda L; Steiner, Victoria

    2013-10-01

    Four of five families are affected by stroke. Many caregivers access the Internet and gather healthcare information from Web-based sources. The purpose of this descriptive evaluation was to assess the usage and design of the Caring∼Web site, which provides education/support for family caregivers of persons with stroke residing in home settings. Thirty-six caregivers from two Midwest states accessed this intervention in a 1-year study. The average participant was 54 years old, White, woman, and the spouse of the care recipient. In a telephone interview, four Web site questions were asked twice a month/bimonthly, and a 33-item survey at the conclusion of the study evaluated the Web site usage and design of its components. Descriptive analysis methods were used, and statistics were collected on the number of visits to the Web site. On average, participants logged on to the Web site 1-2 hours per week, although usage declined after several months for some participants. Participants positively rated the Web site's appearance and usability that included finding the training to be adequate. Web site designers can replicate this intervention for other health conditions.

  18. WEB MINING BASED FRAMEWORK FOR ONTOLOGY LEARNING

    OpenAIRE

    Ramesh, C.; K.V.Chalapati Rao; Govardhan, A

    2015-01-01

    Today, the notion of Semantic Web has emerged as a prominent solution to the problem of organizing the immense information provided by World Wide Web, and its focus on supporting a better co-operation between humans and machines is noteworthy. Ontology forms the major component of Semantic Web in its realization. However, manual method of ontology construction is time-consuming, costly, error-prone and inflexible to change and in addition, it requires a complete participation o...

  19. A Survey on Web Text Information Retrieval in Text Mining

    Directory of Open Access Journals (Sweden)

    Tapaswini Nayak

    2015-08-01

    Full Text Available In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, creation of coarse taxonomies, sentiment analysis, document summarization and entity relation modeling. It is used to mine hidden information from not-structured or semi-structured data. This feature is necessary because a large amount of the Web information is semi-structured due to the nested structure of HTML code, is linked and is redundant. Web content categorization with a content database is the most important tool to the efficient use of search engines. A customer requesting information on a particular subject or item would otherwise have to search through hundred of results to find the most relevant information to his query. Hundreds of results through use of mining text are reduced by this step. This eliminates the aggravation and improves the navigation of information on the Web.

  20. Web-based Media at European Universities: Systems, Usage, and Motivation

    DEFF Research Database (Denmark)

    Godsk, Mikkel

    2009-01-01

    This paper presents the results of two surveys analyzing the usage of and the systems available for web-based media at European universities, and how the teachers can be motivated to increase their usage of such materials in their teaching practice. The surveys were carried out April-May 2009 among...... the EUNIS member universities and include responses from more than 30 different universities in Europe. The surveys show that 93 % of the universities have systems for supporting the use of web-based media in teaching practice, but looking at the diversity in implemented systems, no technical solution seems...... obvious. The surveys also show that many teachers are already using web-based media in their teaching practice and by addressing some of their teaching circumstances it would be possible to increase the usage even further. Based on these results the paper presents five initiatives to motivate the teachers...

  1. Phishing Attack Protection-PAP-Approaches for Fairness in Web Usage

    Directory of Open Access Journals (Sweden)

    Mohiuddin Ahmed

    2011-11-01

    Full Text Available Phishing scams are considered as a threat issue to all web users. But still the web users are not consciously aware of this fact. Many research works have been done to increase the phishing awareness among the users but it is not up to the mark till to date. We have conducted a survey among a diversified group of people who are active user of internet. And then analyzed the existing phishing warnings provided by the web browsers and protection schemes, in this paper we have suggested new approaches i.e. sending notifications to user, checking URL, creating user alarms and security knowledge to ensure fairness in web usage.

  2. Usage of Web Service in Mobile Application for Parents and Students in Binus School Serpong

    Directory of Open Access Journals (Sweden)

    Karto Iskandar

    2016-09-01

    Full Text Available A web service is a service offered by a device electronically to communicate with other electronic device using the World wide web. Smartphone is an electronic device that almost everyone has, especially student and parent for getting information about the school. In BINUS School Serpong mobile application, web services used for getting data from web server like student and menu data. Problem faced by BINUS School Serpong today is the time-consuming application update when using the native application while the application updates are very frequent. To resolve this problem, BINUS School Serpong mobile application will use the web service. This article showed the usage of web services with XML for retrieving data of student. The result from this study is that by using web service, smartphone can retrieve data consistently between multiple platforms. 

  3. WEB MINING IN E-COMMERCE

    Directory of Open Access Journals (Sweden)

    Istrate Mihai

    2009-05-01

    Full Text Available Recently, the web is becoming an important part of people’s life. The web is a very good place to run successful businesses. Selling products or services online plays an important role in the success of businesses that have a physical presence, like a re

  4. A NOVEL APPROACH FOR WEB PERSONALIZATION THROUGH WEB MINING TECHNIQUES

    Directory of Open Access Journals (Sweden)

    B J Doddegowda

    2015-10-01

    Full Text Available Data on World Wide Web has been growing in an exponential manner. This raises a severe concern over information over load challenges for the users. Retrieving the most relevant information from the web as per the user requirement has become hard because of the large collection of heterogeneous documents. It is time consuming for the users to go through the long list of odds and ends to choose their relevant one. One approach to overcome this is to personalize the information available on the Web according to user requirements. The information or services provided by a Web to the requirements of individual or cluster of users, by considering their navigational patterns is termed as Web Personalization. The objective of Web Personalization is to provide users with what they really want or need, without having to ask or search for it explicitly. This approach effectively improves the performance of Information Retrieval (IR systems. This paper presents an extensive survey on the various approaches proposed by the researchers in Web Personalization and challenges with a focus on future work.

  5. Web Usage Statistics: Measurement Issues and Analytical Techniques.

    Science.gov (United States)

    Bertot, John Carlo; McClure, Charles R.; Moen, William E.; Rubin, Jeffrey

    1997-01-01

    One means of Web use evaluation is through analysis of server-generated log files. Various log file analysis techniques and issues are presented that are related to the interpretation of log file data. Study findings indicate a number of problems; recommendations and areas needing further research are outlined. (AEF)

  6. Web使用模式研究中的数据挖掘%Web Access Pattern Data-mining

    Institute of Scientific and Technical Information of China (English)

    张娥; 冯秋红; 宣慧玉; 田增瑞

    2001-01-01

    Web使用模式挖掘是利用Web使用数据的高级手段,是对Web使用数据的深层次分析,从而挖掘出有效的、新颖的、潜在的、有用的及最终可以理解的知识,以帮助管理决策。综述了Web使用模式的数据挖掘研究技术的内容、现状和研究的方向。%Companies are interested in how the users use their Web sites and what they mostly care day by day, for it is fundamental in company making it's strategy. Web usage data mining is an effective means to deeply analyze Web usage data, and it can offer valid, novelty and useful knowledge, then it would be helpful to management decision. In this paper, we introduce what is Web usage or Web usability data-mining, at the same time we present the method be used and the question should be solved in this domain in the future.

  7. A SURVEY ON WEB MULTIMEDIA MINING

    Directory of Open Access Journals (Sweden)

    Pravin M. Kamde

    2011-08-01

    Full Text Available Modern developments in digital media technologies has made transmitting and storing large amounts of multi/rich media data (e.g. text, images, music, video and their combination more feasible and affordable than ever before. However, the state of the art techniques to process, mining and manage those rich media are still in their infancy. Advances developments in multimedia acquisition and storage technology the rapid progress has led to the fast growing incredible amount of data stored in databases. Useful information to users can be revealed if these multimedia files are analyzed. Multimedia mining deals with the extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia files. Also in retrieval, indexing and classification of multimedia data with efficient information fusion of the different modalities is essential for the system's overall performance. The purpose of this paper is to provide a systematic overview of multimedia mining. This article is also represents the issues in the application process component for multimedia mining followed by the multimedia mining models.

  8. A Survey on Web Multimedia Mining

    CERN Document Server

    Kamde, Pravin M

    2011-01-01

    Modern developments in digital media technologies has made transmitting and storing large amounts of multi/rich media data (e.g. text, images, music, video and their combination) more feasible and affordable than ever before. However, the state of the art techniques to process, mining and manage those rich media are still in their infancy. Advances developments in multimedia acquisition and storage technology the rapid progress has led to the fast growing incredible amount of data stored in databases. Useful information to users can be revealed if these multimedia files are analyzed. Multimedia mining deals with the extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia files. Also in retrieval, indexing and classification of multimedia data with efficient information fusion of the different modalities is essential for the system's overall performance. The purpose of this paper is to provide a systematic overview of multimedia mining. This articl...

  9. AN INNOVATIVE WEB MINING APPLICATION ON BLOGS - A LAYOUT

    Directory of Open Access Journals (Sweden)

    S. Prakash

    2012-01-01

    Full Text Available Blogs and Web services agree to express user’s opinions and interests, in the form of small text messages which gives abbreviated and highly personalized remarks in real-time. Recognizing emotion is really significant for a text-based communication tool such as blogs. Nowadays, user opinions in the structure of comments, reviews in blogs have been utilized by researchers for various purposes. Among them the application of sentiment analysis techniques to these opinions is an interesting one. This paper deals with a proposal of a software structural design for constructing Web mining applications in the blog world. The design includes blog crawling and data mining algorithms, to offer a full-fledged and flexible key for constructing general-purpose Web mining applications. The structural design allocates some significant customizations, such as the construction of adapters for reading text from different blogs, and the utilization of different pre-processing methods and data mining procedures. The core of this paper is on explaining the innovative software structural design of the general framework offering thorough information about the data mining sub-framework.

  10. Integrated Web Recommendation Model with Improved Weighted Association Rule Mining

    Directory of Open Access Journals (Sweden)

    S.A.Sahaaya Arul Mary

    2013-04-01

    Full Text Available World Wide Web plays a significant role in human life. It requires a technological improvement to satisfy the user needs. Web log data is essential for improving the performance of the web. It contains large,heterogeneous and diverse data. Analyzing g the web log data is a tedious process for Web developers, Web designers, technologists and end users. In this work, a new weighted association mining algorithm is developed to identify the best association rules that are useful for web site restructuring and recommendation that reduces false visit and improve users’ navigation behavior. The algorithm finds the frequent item set from a large uncertain database. Frequent scanning of database in each time is the problem with the existing algorithms which leads to complex output set and time consuming process. Theproposed algorithm scans the database only once at the beginning of the process and the generated frequent item sets, which are stored into the database. The evaluation parameters such as support, confidence, lift and number of rules are considered to analyze the performance of proposed algorithm and traditional association mining algorithm. The new algorithm produced best result that helps the developer to restructure their website in a way to meet the requirements of the end user within short time span.

  11. A SURVEY ON WEB MULTIMEDIA MINING

    Directory of Open Access Journals (Sweden)

    Pravin M. Kamde

    2011-09-01

    Full Text Available Modern developments in digital media technologies has made transmitting and storing large amounts ofmulti/rich media data (e.g. text, images, music, video and their combination more feasible and affordablethan ever before. However, the state of the art techniques to process, mining and manage those rich mediaare still in their infancy. Advances developments in multimedia acquisition and storage technology therapid progress has led to the fast growing incredible amount of data stored in databases. Usefulinformation to users can be revealed if these multimedia files are analyzed. Multimedia mining deals withthe extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly storedin multimedia files. Also in retrieval, indexing and classification of multimedia data with efficientinformation fusion of the different modalities is essential for the system's overall performance. The purposeof this paper is to provide a systematic overview of multimedia mining. This article is also represents theissues in the application process component for multimedia mining followed by the multimedia miningmodels.

  12. Engineers and the Web: an analysis of real life gaps in information usage

    NARCIS (Netherlands)

    Kraaijenbrink, Jeroen

    2007-01-01

    Engineers face a wide range of gaps when trying to identify, acquire, and utilize information from the Web. To be able to avoid creating such gaps, it is essential to understand them in detail. This paper reports the results of a study of the real life gaps in information usage processes of 17 engin

  13. Examining Web 2.0 Tools Usage of Science Teacher Candidates

    Science.gov (United States)

    Balkan Kiyici, Fatime

    2012-01-01

    Using technology in a science teaching is so important. Only the person, who can use these tools in expert level, can use these tools in their teaching activities. In this research it is aimed firstly identifying science teacher candidates web 2.0 tools usage experience level and factors affecting experience level. In this research survey method…

  14. Place Enrichment by Mining the Web

    Science.gov (United States)

    Alves, Ana O.; Pereira, Francisco C.; Biderman, Assaf; Ratti, Carlo

    In this paper, we address the assignment of semantics to places. The approach followed consists on leveraging from web online resources that are directly or indirectly related to places as well as from the integration with lexical and semantic frameworks such as Wordnet or Semantic Web ontologies. We argue for the wide applicability and validity of this approach to the area of Ubiquitous Computing, particularly for Context Awareness. We present our system, KUSCO, which searches for semantics associations to a given Point Of Interest (POI). Particular focus is provided to the experimentation and validation aspects.

  15. Navigation, findability and the usage of cultural heritage on the web

    DEFF Research Database (Denmark)

    Fransson, Jonas

    2014-01-01

    of objects, e.g. a cultural heritage web site. Three webometric levels are used to both combine and distinguish the data types: usage, content, and structure. The interaction between the system and its users’ information search process was divided into query dependent and query independent aspects. The query......-Resource Interaction (URI) model. The research design is a methodological triangulation, in the form of a mixed methods approach in order to obtain measures and indicators of the resources and the usage from different angels. Four methods are used: site structure analysis; log analysis; web survey; and findability...... analysis. The research design is both sequential and parallel, the site structure analysis preceded the log analysis and the findability analysis, and the web survey was employed independent of the other methods. Three Danish resources are studied: Arkiv for Dansk Litteratur (ADL), a collection of literary...

  16. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  17. A Novel Approach for Web Page Set Mining

    Directory of Open Access Journals (Sweden)

    R.B.Geeta

    2011-11-01

    Full Text Available The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL, the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than algorithms accessing data on flat files. Incremental update is feasible without reaccessing the original transactional database.

  18. Web Based Genetic Algorithm Using Data Mining

    OpenAIRE

    Ashiqur Rahman; Asaduzzaman Noman; Md. Ashraful Islam; Al-Amin Gaji

    2016-01-01

    This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize the prediction accuracy and get a marked improvement over raw classification. It further shows that when the number of features is few; fea...

  19. Mining topological relations from the web

    OpenAIRE

    Schockaert, Steven; Smart, Philip D.; Abdelmoty, Alia I.; Jones, Christopher B.

    2008-01-01

    Topological relations between geographic regions are of interest in many applications. When the exact boundaries of regions are not available, such relations can be established by analysing natural language information from web documents. In particular we demonstrate how redundancy-based techniques can be used to acquire containment and adjacency relations, and how fuzzy spatial reasoning can be employed to maintain the consistency of the resulting knowledge base.

  20. AN EFFECTIVE RECOMMENDATIONS BY DIFFUSION ALGORITHM FOR WEB GRAPH MINING

    Directory of Open Access Journals (Sweden)

    S. Vasukipriya

    2013-04-01

    Full Text Available The information on the World Wide Web grows in an explosive rate. Societies are relying more on the Web for their miscellaneous needs of information. Recommendation systems are active information filtering systems that attempt to present the information items like movies, music, images, books recommendations, tags recommendations, query suggestions, etc., to the users. Various kinds of data bases are used for the recommendations; fundamentally these data bases can be molded in the form of many types of graphs. Aiming at provided that a general framework on effective DR (Recommendations by Diffusion algorithm for web graphs mining. First introduce a novel graph diffusion model based on heat diffusion. This method can be applied to both undirected graphs and directed graphs. Then it shows how to convert different Web data sources into correct graphs in our models.

  1. Technical Note: On The Usage and Development of the AWAKE Web Server and Web Applications

    CERN Document Server

    Berger, Dillon Tanner

    2017-01-01

    The purpose of this technical note is to give a brief explanation of the AWAKE Web Server, the current web applications it serves, and how to edit, maintain, and update the source code. The majority of this paper is dedicated to the development of the server and its web applications.

  2. Segmenting The Web 2.0 Market: Behavioural And Usage Patterns Of Social Web Consumers

    NARCIS (Netherlands)

    Lorenzo-Romero, Carlota; Constantinides, Efthymios; Alarcon-del-Amo, Maria-del-Carmen

    2013-01-01

    The evolution of the commercial Internet to the current phase, commonly called Web 2.0 (or Social Web) has firmly positioned the web not only as a commercial but also as a social communication platform: an online environment facilitating peer-to-peer interaction, socialization, co-operation and info

  3. Segmenting The Web 2.0 Market: Behavioural And Usage Patterns Of Social Web Consumers

    NARCIS (Netherlands)

    Lorenzo Romero, Carlota; Constantinides, Efthymios; Alarcon-del-Amo, Maria-del-Carmen

    2010-01-01

    The evolution of the commercial Internet to the current phase, commonly called Web 2.0 (or Social Web) has firmly positioned the web not only as a commercial but also as a social communication platform: an online environment facilitating peer-to-peer interaction, socialization, co-operation and info

  4. Research of Web Data Mining Based on XML%基于XML的Web数据挖掘的研究

    Institute of Scientific and Technical Information of China (English)

    刘振岩; 王万森

    2003-01-01

    The paper advances a system framework of Web data mining based on XML. This system framework inte-grates Information Retrieval with Information Extraction, and utilizes traditional data mining methods to completeWeb data mining through XML.

  5. DAMEWARE: A web cyberinfrastructure for astrophysical data mining

    CERN Document Server

    Brescia, Massimo; Longo, Giuseppe; Nocella, Alfonso; Garofalo, Mauro; Manna, Francesco; Esposito, Francesco; Albano, Giovanni; Guglielmo, Marisa; D'Angelo, Giovanni; Di Guido, Alessandro; Djorgovski, George S; Donalek, Ciro; Mahabal, Ashish A; Graham, Matthew J; Fiore, Michelangelo; D'Abrusco, Raffaele

    2014-01-01

    Astronomy is undergoing through a methodological revolution triggered by an unprecedented wealth of complex and accurate data. The new panchromatic, synoptic sky surveys require advanced tools for discovering patterns and trends hidden behind data which are both complex and of high dimensionality. We present DAMEWARE (DAta Mining & Exploration Web Application REsource): a general purpose, web-based, distributed data mining environment developed for the exploration of large datasets, and finely tuned for astronomical applications. By means of graphical user interfaces, it allows the user to perform classification, regression or clustering tasks with machine learning methods. Salient features of DAMEWARE include its capability to work on large datasets with minimal human intervention, and to deal with a wide variety of real problems such as the classification of globular clusters in the galaxy NGC1399, the evaluation of photometric redshifts and, finally, the identification of candidate Active Galactic Nucl...

  6. Applied Approaches of Rough Set Theory to Web Mining

    Institute of Scientific and Technical Information of China (English)

    SUN Tie-li; JIAO Wei-wei

    2006-01-01

    Rough set theory is a new soft computing tool, and has received much attention of researchers around the world. It can deal with incomplete and uncertain information. Now,it has been applied in many areas successfully. This paper introduces the basic concepts of rough set and discusses its applications in Web mining. In particular, some applications of rough set theory to intelligent information processing are emphasized.

  7. Data mining for pulsing the emotion on the web.

    Science.gov (United States)

    Borras-Morell, Jose Enrique

    2015-01-01

    The Internet is becoming an increasingly important part of our lives. Internet users share personal information and opinions on social media webs expressing their feelings, judgments, feelings or emotions easy. Text mining and information retrieval techniques allow us to explore all this information and discover what the authors' opinions, claims, or assertions are. A general overview of sentiment analysis' current approaches and its future challenges, providing basic information on their current trends, is made throughout this chapter.

  8. Web 2.0 usage among New Zealand learners: Findings on gender difference

    Directory of Open Access Journals (Sweden)

    Ning Wei

    Full Text Available In this paper, gender differences in Web 2.0 usage by postgraduate students in New Zealand are presented. 84 postgraduate students drawn from two different convenience samples were surveyed to discover the extent to which they used and were familiar with Web 2.0 applications. According to Cuadrado-García, Ruiz-Molina and Montoro-Pons (2010, p. 367, \\"men and women differ in their interaction with technology\\". In this study, gender differences in the use of different Web 2.0 applications and technologies have been considered. Whilst findings from this study are limited by the way in which the populations were sampled, the sample size and having a majority of international students with English as a second language, it is interesting to note that there were only minor differences between the ways in which male and female postgraduate students use Web 2.0 applications.

  9. A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

    Directory of Open Access Journals (Sweden)

    Geraint Duck

    Full Text Available Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT, though some are instead seeing rapid growth (e.g., the GO, R. We find a striking imbalance in resource usage with the top 5% of resource names (133 names accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.

  10. Mining social media and web searches for disease detection

    Directory of Open Access Journals (Sweden)

    Y. Tony Yang

    2013-05-01

    Full Text Available Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.

  11. Mining social media and web searches for disease detection.

    Science.gov (United States)

    Yang, Y Tony; Horneffer, Michael; DiLisio, Nicole

    2013-04-28

    Web-based social media is increasingly being used across different settings in the health care industry. The increased frequency in the use of the Internet via computer or mobile devices provides an opportunity for social media to be the medium through which people can be provided with valuable health information quickly and directly. While traditional methods of detection relied predominately on hierarchical or bureaucratic lines of communication, these often failed to yield timely and accurate epidemiological intelligence. New web-based platforms promise increased opportunities for a more timely and accurate spreading of information and analysis. This article aims to provide an overview and discussion of the availability of timely and accurate information. It is especially useful for the rapid identification of an outbreak of an infectious disease that is necessary to promptly and effectively develop public health responses. These web-based platforms include search queries, data mining of web and social media, process and analysis of blogs containing epidemic key words, text mining, and geographical information system data analyses. These new sources of analysis and information are intended to complement traditional sources of epidemic intelligence. Despite the attractiveness of these new approaches, further study is needed to determine the accuracy of blogger statements, as increases in public participation may not necessarily mean the information provided is more accurate.

  12. The Influence of Perceived Organizational Injustice towards Workplace Personal Web Usage and Work Productivity in Indonesia

    OpenAIRE

    Nur Fathonah; Yanki Hartijasti

    2014-01-01

    Workplace personal web usage (WPWU) is an employee’s activity in using internet for non-related task during working hours. It is considered a counterproductive behavior when done excessively because it can interrupt employee’s productivity, but it can increase creativity and eliminate bore- dom when used in a rational amount. The objective of this study was to prove whether perceived organizational injustice had influence on WPWU which affected work productivity. A total of 222 respondents wo...

  13. Web Based Genetic Algorithm Using Data Mining

    Directory of Open Access Journals (Sweden)

    Ashiqur Rahman

    2016-09-01

    Full Text Available This paper presents an approach for classifying students in order to predict their final grade based on features extracted from logged data in an education web-based system. A combination of multiple classifiers leads to a significant improvement in classification performance. Through weighting the feature vectors using a Genetic Algorithm we can optimize the prediction accuracy and get a marked improvement over raw classification. It further shows that when the number of features is few; feature weighting is works better than just feature selection. Many leading educational institutions are working to establish an online teaching and learning presence. Several systems with different capabilities and approaches have been developed to deliver online education in an academic setting. In particular, Michigan State University (MSU has pioneered some of these systems to provide an infrastructure for online instruction. The research presented here was performed on a part of the latest online educational system developed at MSU, the Learning Online Network with Computer-Assisted Personalized Approach (LON-CAPA

  14. Web Approach for Ontology-Based Classification, Integration, and Interdisciplinary Usage of Geoscience Metadata

    Directory of Open Access Journals (Sweden)

    B Ritschel

    2012-10-01

    Full Text Available The Semantic Web is a W3C approach that integrates the different sources of semantics within documents and services using ontology-based techniques. The main objective of this approach in the geoscience domain is the improvement of understanding, integration, and usage of Earth and space science related web content in terms of data, information, and knowledge for machines and people. The modeling and representation of semantic attributes and relations within and among documents can be realized by human readable concept maps and machine readable OWL documents. The objectives for the usage of the Semantic Web approach in the GFZ data center ISDC project are the design of an extended classification of metadata documents for product types related to instruments, platforms, and projects as well as the integration of different types of metadata related to data product providers, users, and data centers. Sources of content and semantics for the description of Earth and space science product types and related classes are standardized metadata documents (e.g., DIF documents, publications, grey literature, and Web pages. Other sources are information provided by users, such as tagging data and social navigation information. The integration of controlled vocabularies as well as folksonomies plays an important role in the design of well formed ontologies.

  15. The path most travelled: Mining road usage patterns from massive call data

    CERN Document Server

    Toole, Jameson L; Alhasoun, Fahad; Evsukoff, Alexandre; Gonzalez, Marta C

    2014-01-01

    Rapid urbanization places increasing stress on already burdened transportation systems, resulting in delays and poor levels of service. Billions of spatiotemporal call detail records (CDRs) collected from mobile devices create new opportunities to quantify and solve these problems. However, there is a need for tools to map new data onto existing transportation infrastructure. In this work, we propose a system that leverages this data to identify patterns in road usage. First, we develop an algorithm to mine billions of calls and learn location transition probabilities of callers. These transition probabilities are then upscaled with demographic data to estimate origin-destination (OD) flows of residents between any two intersections of a city. Next, we implement a distributed incremental traffic assignment algorithm to route these flows on road networks and estimate congestion and level of service for each roadway. From this assignment, we construct a bipartite usage network by connecting census tracts to the...

  16. On-Board Mining in the Sensor Web

    Science.gov (United States)

    Tanner, S.; Conover, H.; Graves, S.; Ramachandran, R.; Rushing, J.

    2004-12-01

    On-board data mining can contribute to many research and engineering applications, including natural hazard detection and prediction, intelligent sensor control, and the generation of customized data products for direct distribution to users. The ability to mine sensor data in real time can also be a critical component of autonomous operations, supporting deep space missions, unmanned aerial and ground-based vehicles (UAVs, UGVs), and a wide range of sensor meshes, webs and grids. On-board processing is expected to play a significant role in the next generation of NASA, Homeland Security, Department of Defense and civilian programs, providing for greater flexibility and versatility in measurements of physical systems. In addition, the use of UAV and UGV systems is increasing in military, emergency response and industrial applications. As research into the autonomy of these vehicles progresses, especially in fleet or web configurations, the applicability of on-board data mining is expected to increase significantly. Data mining in real time on board sensor platforms presents unique challenges. Most notably, the data to be mined is a continuous stream, rather than a fixed store such as a database. This means that the data mining algorithms must be modified to make only a single pass through the data. In addition, the on-board environment requires real time processing with limited computing resources, thus the algorithms must use fixed and relatively small amounts of processing time and memory. The University of Alabama in Huntsville is developing an innovative processing framework for the on-board data and information environment. The Environment for On-Board Processing (EVE) and the Adaptive On-board Data Processing (AODP) projects serve as proofs-of-concept of advanced information systems for remote sensing platforms. The EVE real-time processing infrastructure will upload, schedule and control the execution of processing plans on board remote sensors. These plans

  17. 万维网知识挖掘方法的研究%The Research of Web Mining

    Institute of Scientific and Technical Information of China (English)

    沈达阳; 孙茂松

    2000-01-01

    The application of knowledge mining technology,is the key to improve the information searching ability of World Wide Web. Atming at different types of information,the paper introduces several Web mining technologies,which are applied in ParaSite,DRS and WebKB respectively,after comparing their characteristics,an idea of building common Web mining systsem is presented.

  18. WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RULE MINING

    Directory of Open Access Journals (Sweden)

    Pratiyush Guleria

    2015-11-01

    Full Text Available This paper aims to explain the web-enabled tools for educational data mining. The proposed web-based tool developed using Asp.Net framework and php can be helpful for universities or institutions providing the students with elective courses as well improving academic activities based on feedback collected from students. In Asp.Net tool, association rule mining using Apriori algorithm is used whereas in php based Feedback Analytical Tool, feedback related to faculty and institutional infrastructure is collected from students and based on that Feedback it shows performance of faculty and institution. Using that data, it helps management to improve in-house training skills and gains knowledge about educational trends which is to be followed by faculty to improve the effectiveness of the course and teaching skills.

  19. DAME: A Web Oriented Infrastructure for Scientific Data Mining & Exploration

    CERN Document Server

    Brescia, Massimo; Djorgovski, George S; Cavuoti, Stefano; D'Abrusco, Raffaele; Donalek, Ciro; Di Guido, Alessandro; Fiore, Michelangelo; Garofalo, Mauro; Laurino, Omar; Mahabal, Ashish; Manna, Francesco; Nocella, Alfonso; d'Angelo, Giovanni; Paolillo, Maurizio

    2010-01-01

    Nowadays, many scientific areas share the same need of being able to deal with massive and distributed datasets and to perform on them complex knowledge extraction tasks. This simple consideration is behind the international efforts to build virtual organizations such as, for instance, the Virtual Observatory (VObs). DAME (DAta Mining & Exploration) is an innovative, general purpose, Web-based, VObs compliant, distributed data mining infrastructure specialized in Massive Data Sets exploration with machine learning methods. Initially fine tuned to deal with astronomical data only, DAME has evolved in a general purpose platform which has found applications also in other domains of human endeavor. We present the products and a short outline of a science case, together with a detailed description of DAMEs main features and architecture.

  20. Web mining for topics defined by complex and precise predicates

    Science.gov (United States)

    Lee, Ching-Cheng; Sampathkumar, Sushma

    2004-04-01

    The enormous growth of the World Wide Web has made it important to perform resource discovery efficiently for any given topic. Several new techniques have been proposed in the recent years for this kind of topic specific web-mining, and among them a key new technique called focused crawling which is able to crawl topic-specific portions of the web without having to explore all pages. Most existing research on focused crawling considers a simple topic definition that typically consists of one or more keywords connected by an OR operator. However this kind of simple topic definition may result in too many irrelevant pages in which the same keyword appears in a wrong context. In this research we explore new strategies for crawling topic specific portions of the web using complex and precise predicates. A complex predicate will allow the user to precisely specify a topic using Boolean operators such as "AND", "OR" and "NOT". Our work will concentrate on defining a format to specify this kind of a complex topic definition and secondly on devising a crawl strategy to crawl the topic specific portions of the web defined by the complex predicate, efficiently and with minimal overhead. Our new crawl strategy will improve the performance of topic-specific web crawling by reducing the number of irrelevant pages crawled. In order to demonstrate the effectiveness of the above approach, we have built a complete focused crawler called "Eureka" with complex predicate support, and a search engine that indexes and supports end-user searches on the crawled pages.

  1. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval

    Directory of Open Access Journals (Sweden)

    P. R. Kumar

    2010-01-01

    Full Text Available Problem statement: A study on hyperlink analysis and the algorithms used for link analysis in the Web Information retrieval was done. Approach: This research was initiated because of the dependability of search engines for information retrieval in the web. Understand the web structure mining and determine the importance of hyperlink in web information retrieval particularly using the Google Search engine. Hyperlink analysis was important methodology used by famous search engine Google to rank the pages. Results: The different algorithms used for link analysis like PageRank (PR, Weighted PageRank (WPR and Hyperlink-Induced Topic Search (HITS algorithms are discussed and compared. PageRank algorithm was implemented using a Java program and the convergence of the PageRank values are shown in a chart form. Conclusion: This study was done basically to explore the link structure algorithms for ranking and compare those algorithms. The further research on this area will be problems facing PageRank algorithm and how to handle those problems.

  2. Navigation, findability and the usage of cultural heritage on the web

    DEFF Research Database (Denmark)

    Fransson, Jonas

    2014-01-01

    of objects, e.g. a cultural heritage web site. Three webometric levels are used to both combine and distinguish the data types: usage, content, and structure. The interaction between the system and its users’ information search process was divided into query dependent and query independent aspects. The query...... dependent aspects contain the information need on the user side and the topic of the content on the system side. The query independent aspects are the structural findability on the system side and the users search skills on the user side. The conceptual framework is summarised in the User...... and navigation patterns are studied. Navigation through a web search engine IV is the most common way to reach the resources, but both direct navigation and link navigation are also used in all three resources. Most users arrive in the middle level in ADL and KID, at information on authors and artists...

  3. Web Fuzzy Clustering and a Case Study

    Institute of Scientific and Technical Information of China (English)

    LIU Mao-fu; HE Jing; HE Yan-xiang; HU Hui-jun

    2004-01-01

    We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering.

  4. Exploring the underlying structure of personal Web usage in the workplace.

    Science.gov (United States)

    Anandarajan, Murugan; Simmers, Claire A; D'Ovidio, Rob

    2011-10-01

    Personal Web usage (PWU) represents a tension between employers and employees as employers generally regard employees' PWU as negative and employees see many PWU behaviors as acceptable. Employers attempt to limit PWU primarily through electronic monitoring and Internet usage policies. Employees, however, find PWU not only permissible, but also useful and rationalize these workplace behaviors. While researchers identified many PWU behaviors, the underlying structure of the phenomenon is not clearly understood. In this article, we offer a comprehensive new definition of PWU, and develop an empirically grounded classification of PWU framed by psychological contract theory and based on two studies. Using multidimensional scaling techniques and cluster analysis, we identified four clusters of PWU behaviors-work/family, hedonic, self-development, and citizenship. The results provide information as to what constitutes the domain of PWU, and how various usages are related to one another via the dimensions of individual and organizational benefits. We offer directions for future work and suggest how our work might be useful to practitioners.

  5. DERIVING USER ACCESS PATTERNS AND MINING WEB COMMUNITY WITH WEB-LOG DATA FOR PREDICTING USER SESSIONS WITH PAJEK

    Directory of Open Access Journals (Sweden)

    S. Balaji

    2012-10-01

    Full Text Available Web logs are a young and dynamic media type. Due to the intrinsic relationship among Web objects and the deficiency of a uniform schema of web documents, Web community mining has become significant area for Web data management and analysis. The research of Web communities extents a number of research domains. In this paper an ontological model has been present with some recent studies on this topic, which cover finding relevant Web pages based on linkage information, discovering user access patterns through analyzing Web log files from Web data. A simulation has been created with the academic website crawled data. The simulation is done in JAVA and ORACLE environment. Results show that prediction of user session could give us plenty of vital information for the Business Intelligence. Search Engine Optimization could also use these potential results which are discussed in the paper in detail.

  6. A web server for mining Comparative Genomic Hybridization (CGH) data

    Science.gov (United States)

    Liu, Jun; Ranka, Sanjay; Kahveci, Tamer

    2007-11-01

    Advances in cytogenetics and molecular biology has established that chromosomal alterations are critical in the pathogenesis of human cancer. Recurrent chromosomal alterations provide cytological and molecular markers for the diagnosis and prognosis of disease. They also facilitate the identification of genes that are important in carcinogenesis, which in the future may help in the development of targeted therapy. A large amount of publicly available cancer genetic data is now available and it is growing. There is a need for public domain tools that allow users to analyze their data and visualize the results. This chapter describes a web based software tool that will allow researchers to analyze and visualize Comparative Genomic Hybridization (CGH) datasets. It employs novel data mining methodologies for clustering and classification of CGH datasets as well as algorithms for identifying important markers (small set of genomic intervals with aberrations) that are potentially cancer signatures. The developed software will help in understanding the relationships between genomic aberrations and cancer types.

  7. Enhancing the usage pattern mining performance with temporal segmentation of QPop Increment in digital libraries

    Institute of Scientific and Technical Information of China (English)

    CAO San-xing; KLEIN R.Rody; LIU Jian-bo

    2005-01-01

    The convergence of next-generation Networks and the emergence of new media systems have made media-rich digital libraries popular in application and research. The discovery of media content objects' usage patterns, where QPop Increment is the characteristic feature under study, is the basis of intelligent data migration scheduling, the very key issue for these systems to manage effectively the massive storage facilities in their backbones. In this paper, a clustering algorithm is established, on the basis of temporal segmentation of QPop Increment, so as to improve the mining performance. We employed the standard C-Means algorithm as the clustering kernel, and carried out the experimental mining process with segmented QPop Increases obtained in actual applications. The results indicated that the improved algorithm is more advantageous than the basic one in important indices such as the clustering cohesion. The experimental study in this paper is based on a Media Assets Library prototype developed for the use of the advertainment movie production project for Olympics 2008, under the support of both the Humanistic Olympics Study Center in Beijing, and China State Administration of Radio, Film and TV.

  8. Psychosocial service needs of pediatric transport accident survivors: Using clinical data-mining to establish demographic and service usage characteristics.

    Science.gov (United States)

    Manguy, Alys-Marie; Joubert, Lynette; Bansemer, Leah

    2016-09-01

    The objectives in this article are the exploration of demographic and service usage data gained through clinical data mining audit and suggesting recommendations for social work service delivery model and future research. The method is clinical data-mining audit of 100 sequentially sampled cases gathering quantitative demographic and service usage data. Descriptive analysis of file audit data raised interesting trends with potential to inform service delivery and usage; the key areas of the results included patient demographics, family involvement and impact, and child safety and risk issues. Transport accidents involving children often include other family members. Care planning must take into account psychosocial issues including patient and family emotional responses, availability of primary carers, and other practical needs that may impact on recovery and discharge planning. This study provides evidence to plan for further research and development of more integrated models of care.

  9. 基于SAS的Web使用日志用户聚类分析%User Cluster Analysis of Web Usages Logs Based on SAS

    Institute of Scientific and Technical Information of China (English)

    欧阳烽

    2013-01-01

    The user cluster Analysis of Web Usages Logs based on SAS is the data of Web Usages Logs for data conversion, get-ting the user transaction table which is pre-formed after the handling of the corresponding data, then making the user cluster anal-ysis through the SAS data mining tools. Digital resources are reasonably procured and managed according to the demands of dif-ferent users on digital resources. Personalized services can be provided for different users.%基于SAS的Web使用日志用户聚类分析,即通过SAS数据挖掘工具将由Web使用日志数据经过数据转换和数据预处理后形成的用户事务表数据运用不同的方法进行聚类分析,以达到根据不同类别用户的需求对数字资源进行合理的采购和管理,为用户提供个性化服务的目的。

  10. XML在Web数据挖掘中的应用%XML Applications in Web Data Mining

    Institute of Scientific and Technical Information of China (English)

    陆婷

    2011-01-01

    在Web数据挖掘中,基于XML半结构化的数据挖掘方法简单,有效并且低成本。本文首先给出数据挖掘的定义;然后介绍常用的数据挖掘技术;讨论Web数据挖掘和XML,指出Web数据挖掘的困难,以及XML在Web数据挖掘中的应用。本文对于研究Web数据挖掘的工程技术人员有一定的参考值。%In Web data mining,semi-structured XML-based data mining method is simple,effective and low cost.This paper first gives the definition of data mining;then commonly used in data mining techniques;discussion of Web data mining and XML,pointed out the difficulties of Web data mining,and XML in Web data mining applications.This Web data mining for the study of engineering and technical personnel have a certain reference value.

  11. Bidirectional Growth Based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns

    Directory of Open Access Journals (Sweden)

    Srikantaiah K C

    2013-04-01

    Full Text Available Web sequential patterns are important for analyzing and understanding users’ behaviour to improve the quality of service offered by the World Wide Web. Web Prefetching is one such technique that utilizes prefetching rules derived through Cyclic Model Analysis of the mined Web sequential patterns. The moreaccurate the prediction and more satisfying the results of prefetching if we use a highly efficient and scalable mining technique such as the Bidirectional Growth based Directed Acyclic Graph. In this paper, we propose a novel algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of web sequential Patterns (BGCAP that effectively combines these strategies to generate prefetching rules in the form of 2-sequence patterns with Periodicity and threshold of Cyclic Behaviour that can be utilized toeffectively prefetch Web pages, thus reducing the users’ perceived latency. As BGCAP is based on Bidirectional pattern growth, it performs only (log n+1 levels of recursion for mining n Web sequential patterns. Our experimental results show that prefetching rules generated using BGCAP is 5-10% faster for different data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition, BGCAP generates about 5-15% more prefetching rules than TD-Mine

  12. The Influence of Perceived Organizational Injustice towards Workplace Personal Web Usage and Work Productivity in Indonesia

    Directory of Open Access Journals (Sweden)

    Nur Fathonah

    2014-10-01

    Full Text Available Workplace personal web usage (WPWU is an employee’s activity in using internet for non-related task during working hours. It is considered a counterproductive behavior when done excessively because it can interrupt employee’s productivity, but it can increase creativity and eliminate bore- dom when used in a rational amount. The objective of this study was to prove whether perceived organizational injustice had influence on WPWU which affected work productivity. A total of 222 respondents working in various industries were gathered through web-survey. By using multino- mial logistic regression analysis, this study found that high level use of internet for unrelated jobs between 2 to 4 hours a day was influenced by respondents’ perception of not getting fair treatment and incentive for being good performer, which then caused them to perform very low completion of tasks. There were two contrasting views regarding this result; organizations considered it as deviant behavior because it reduced employees’ performance whereas employees regarded it as just short breaks to get rid of stress. Hence, this finding suggested that companies should redesign its internet policies to accommodate “Work-Life Blend”; blending work and personal lives, as a consequence of cultural shift in the era of globalization and new technologies.

  13. From Java Web Application to Web Mining%从Java Web应用到Web挖掘

    Institute of Scientific and Technical Information of China (English)

    李淑华

    2016-01-01

    随着web技术的发展,演示层、业务规则层从数据层分离出来。模型视图控制器(MVC,Model View Controller)是第一个分离演示层和业务规则层的设计模式,提高了组件的灵活性和复用性。在“浏览器-服务器”(Browser-Server)模式下,客户端只需要浏览器即可完成工作,有效地降低了运行与维护成本。本文集中探讨如何用J2EE技术实现企事业单位内(Intranet)外(Internet)Web应用及Web Ming(挖掘)技术。%Followed by the development of Web technology, presentation tier and business logic tier is separated from program. Model-View-Controller is the first design pattern which separates presentation tier and business logic tier, and improves the flexibility and reusability of components. Under Browser-Server pattern, clients can finalize its work only by Browser, which lower the cost of runnning and maintenance. This paper focuses on the implementation technology of Enterprise Intranet, Internet Spring Framework and Web mining.

  14. Usage Analysis of Web 2.0 and Library 2.0 Tools by Librarians in Kwara State Academic Libraries

    Science.gov (United States)

    Tella, Adeyinka; Soluoku, Taofeeqat

    2016-01-01

    This study analysed the usage of Web 2.0 and Library 2.0 tools by librarians in Kwara State academic libraries. A sample of 40 librarians was surveyed through total enumeration sampling technique from four different tertiary education institutions libraries in Kwara State, Nigeria. Questionnaire was used for the collection of data. The collected…

  15. An Introduction to Social Semantic Web Mining & Big Data Analytics for Political Attitudes and Mentalities Research

    National Research Council Canada - National Science Library

    Markus Schatten; Jurica Seva; Bogdan Okresa Ðuric

    2015-01-01

    ... and behavior or people on-line. Some of these techniques include social web mining, conceptual and social network analysis and modeling, tag clouds, topic maps, folksonomies, complex network visualizations, modeling of processes...

  16. The Impact of Media Richness on the Usage of Web 2.0 Services for Knowledge Transfer

    DEFF Research Database (Denmark)

    Gyamfi, Albert

    2016-01-01

    remains a major challenge, especially in the Cocoa Sector. The selection of media for a given task depends on the richness of the media and the characteristics of the task. The four modes of knowledge transfer theorized by Nonaka, require the use of media with varying degrees of richness. The study...... proposed that the usage of web 2.0 applications for the different modes of knowledge transfer can be affected by their media richness. And the use of web 2.0 applications for the knowledge transfer modes can influence knowledge transfer success. The study was conducted using a mixed method approach...... with a survey questionnaire. The results of the data analysis confirmed that the media richness of the selected web 2.0 applications affect their usage for the different modes of knowledge transfer....

  17. A construction scheme of web page comment information extraction system based on frequent subtree mining

    Science.gov (United States)

    Zhang, Xiaowen; Chen, Bingfeng

    2017-08-01

    Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.

  18. Web based parallel/distributed medical data mining using software agents

    Energy Technology Data Exchange (ETDEWEB)

    Kargupta, H.; Stafford, B.; Hamzaoglu, I.

    1997-12-31

    This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

  19. Mining

    Directory of Open Access Journals (Sweden)

    Khairullah Khan

    2014-09-01

    Full Text Available Opinion mining is an interesting area of research because of its applications in various fields. Collecting opinions of people about products and about social and political events and problems through the Web is becoming increasingly popular every day. The opinions of users are helpful for the public and for stakeholders when making certain decisions. Opinion mining is a way to retrieve information through search engines, Web blogs and social networks. Because of the huge number of reviews in the form of unstructured text, it is impossible to summarize the information manually. Accordingly, efficient computational methods are needed for mining and summarizing the reviews from corpuses and Web documents. This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

  20. Informal Learning through Expertise Mining in the Social Web

    Science.gov (United States)

    Valencia-Garcia, Rafael; Garcia-Sanchez, Francisco; Casado-Lumbreras, Cristina; Castellanos-Nieves, Dagoberto; Fernandez-Breis, Jesualdo Tomas

    2012-01-01

    The advent of Web 2.0, also called the Social Web, has changed the way people interact with the Web. Assisted by the technologies associated with this new trend, users now play a much more active role as content providers. This Web paradigm shift has also changed how companies operate and interact with their employees, partners and customers. The…

  1. Informal Learning through Expertise Mining in the Social Web

    Science.gov (United States)

    Valencia-Garcia, Rafael; Garcia-Sanchez, Francisco; Casado-Lumbreras, Cristina; Castellanos-Nieves, Dagoberto; Fernandez-Breis, Jesualdo Tomas

    2012-01-01

    The advent of Web 2.0, also called the Social Web, has changed the way people interact with the Web. Assisted by the technologies associated with this new trend, users now play a much more active role as content providers. This Web paradigm shift has also changed how companies operate and interact with their employees, partners and customers. The…

  2. Data Mining for Web-Based Support Systems: A Case Study in e-Custom Systems

    Science.gov (United States)

    Razmerita, Liana; Kirchner, Kathrin

    This chapter provides an example of a Web-based support system (WSS) used to streamline trade procedures, prevent potential security threats, and reduce tax-related fraud in cross-border trade. The architecture is based on a service-oriented architecture that includes smart seals and Web services. We discuss the implications and suggest further enhancements to demonstrate how such systems can move toward a Web-based decision support system with the support of data mining methods. We provide a concrete example of how data mining can help to analyze the vast amount of data collected while monitoring the container movements along its supply chain.

  3. What explains usage of mobile physician-rating apps? Results from a web-based questionnaire.

    Science.gov (United States)

    Bidmon, Sonja; Terlutter, Ralf; Röttl, Johanna

    2014-06-11

    Consumers are increasingly accessing health-related information via mobile devices. Recently, several apps to rate and locate physicians have been released in the United States and Germany. However, knowledge about what kinds of variables explain usage of mobile physician-rating apps is still lacking. This study analyzes factors influencing the adoption of and willingness to pay for mobile physician-rating apps. A structural equation model was developed based on the Technology Acceptance Model and the literature on health-related information searches and usage of mobile apps. Relationships in the model were analyzed for moderating effects of physician-rating website (PRW) usage. A total of 1006 randomly selected German patients who had visited a general practitioner at least once in the 3 months before the beginning of the survey were randomly selected and surveyed. A total of 958 usable questionnaires were analyzed by partial least squares path modeling and moderator analyses. The suggested model yielded a high model fit. We found that perceived ease of use (PEOU) of the Internet to gain health-related information, the sociodemographic variables age and gender, and the psychographic variables digital literacy, feelings about the Internet and other Web-based applications in general, patients' value of health-related knowledgeability, as well as the information-seeking behavior variables regarding the amount of daily private Internet use for health-related information, frequency of using apps for health-related information in the past, and attitude toward PRWs significantly affected the adoption of mobile physician-rating apps. The sociodemographic variable age, but not gender, and the psychographic variables feelings about the Internet and other Web-based applications in general and patients' value of health-related knowledgeability, but not digital literacy, were significant predictors of willingness to pay. Frequency of using apps for health-related information

  4. Provenance-Based Approaches to Semantic Web Service Discovery and Usage

    Science.gov (United States)

    Narock, Thomas William

    2012-01-01

    The World Wide Web Consortium defines a Web Service as "a software system designed to support interoperable machine-to-machine interaction over a network." Web Services have become increasingly important both within and across organizational boundaries. With the recent advent of the Semantic Web, web services have evolved into semantic…

  5. Provenance-Based Approaches to Semantic Web Service Discovery and Usage

    Science.gov (United States)

    Narock, Thomas William

    2012-01-01

    The World Wide Web Consortium defines a Web Service as "a software system designed to support interoperable machine-to-machine interaction over a network." Web Services have become increasingly important both within and across organizational boundaries. With the recent advent of the Semantic Web, web services have evolved into semantic…

  6. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences

    Directory of Open Access Journals (Sweden)

    Bauer Margarete

    2004-10-01

    Full Text Available Abstract Background In the emerging field of environmental genomics, direct cloning and sequencing of genomic fragments from complex microbial communities has proven to be a valuable source of new enzymes, expanding the knowledge of basic biological processes. The central problem of this so called metagenome-approach is that the cloned fragments often lack suitable phylogenetic marker genes, rendering the identification of clones that are likely to originate from the same genome difficult or impossible. In such cases, the analysis of intrinsic DNA-signatures like tetranucleotide frequencies can provide valuable hints on fragment affiliation. With this application in mind, the TETRA web-service and the TETRA stand-alone program have been developed, both of which automate the task of comparative tetranucleotide frequency analysis. Availability: http://www.megx.net/tetra Results TETRA provides a statistical analysis of tetranucleotide usage patterns in genomic fragments, either via a web-service or a stand-alone program. With respect to discriminatory power, such an analysis outperforms the assignment of genomic fragments based on the (G+C-content, which is a widely-used sequence-based measure for assessing fragment relatedness. While the web-service is restricted to the calculation of correlation coefficients between tetranucleotide usage patterns of submitted DNA sequences, the stand-alone program generates a much more detailed output, comprising all raw data and graphical plots. The stand-alone program is controlled via a graphical user interface and can batch-process a multitude of sequences. Furthermore, it comes with pre-computed tetranucleotide usage patterns for 166 prokaryote chromosomes, providing a useful reference dataset and source for data-mining. Conclusions Up to now, the analysis of skewed oligonucleotide distributions within DNA sequences is not a commonly used tool within metagenomics. With the TETRA web-service and stand

  7. SSM-DBSCANand SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data.

    Directory of Open Access Journals (Sweden)

    Ms K.Santhisree

    2011-08-01

    Full Text Available Clustering web sessions is to group web sessions based on similarity and consists of minimizing the intra-group similarity and maximizing the inter-group similarity. Here in this paper we developed a new similarity measure named SSM(Sequence Similarity Measure and enhanced an existing DBSCAN and OPTICS clustering techniques namely SSM-DBSCAN, and SSM-OPTICS for clustering web sessions for web personalization. Then we adopted various similarity measures like Euclidean distance, Jaccard, Cosine and Fuzzy similarity measures to measure the similarity of web sessions using sequence alignment to determine learning behaviors of web usage data. This new measure hassignificant results when comparing similarities between web sessions with other previous measures. We performed a variety of experiments in the context of density based clustering, using existing DBSCANand OPTICS and developed SSM-DBSCAN and SSM-OPTICS based on sequence alignment to measure similarities between web sessions where sessions are chronologically ordered sequences of page visits. Finally the time and the memory required to perform clustering using SSM is less when compared to other similarity measures.

  8. SEMANTIC FOCUSED WEB CRAWLER FOR SERVICE DISCOVERY USING DATA MINING TECHNIQUE

    Directory of Open Access Journals (Sweden)

    Ruchika Patel

    2015-10-01

    Full Text Available Data mining is the process of extraction of hidden predictive information from the huge databases. It is a new technology with great latent to help companies focus on the most important information in their data warehouses. Web mining is a data mining techniques which automatically discover information from web documents. The amount of data and its dynamicity makes it impossible to crawl the World Wide Web (WWW completely. It’s a challenge in front of crawlers to crawl only the relevant pages from this information explosion. Thus a focused crawler solves this issue of relevancy by focusing on web pages for some given topic or a set of topics. Nowadays finding meaningful information among the billions of information resources on the World Wide Web is a difficult task due to growing popularity of the Internet. This paper basically focuses on study of the various techniques of data mining for finding the relevant information from World Wide Web using web crawler.

  9. Investigating teachers’ Web 2.0 tools awareness, frequency and purposes of usage in terms of different variables

    Directory of Open Access Journals (Sweden)

    Mehmet Barış Horzum

    2010-03-01

    Full Text Available Web2.0 technologies have been becoming widespread and used increasingly in nowadays. These tools could be used effectively in education. The aim of this study is to examine the awareness of the teachers about Web2.0 tools, frequency of their usage and the purposes of usage in terms of different variables. By this purpose, data was collected by a survey, which was developed by researcher, from 183 teachers who attended in-service training of Ministry of Education. As a result of data analysis, it is founded that teachers were aware of Facebook, MSN and VSS while they weren’t aware of Weblogs and Podcast. Teachers are predominantly using Facebook once a week, using MSN everyday, using VSS a few days in a week and they are not using Wikipedia, Weblogs and Podcast. The teachers are predominantly using Facebook, MSN and VSS for fun and communication, using Wiki, Podcast and Weblogs for accessing information.

  10. Intelligent Information Retrieval and Web Mining Architecture Using SOA

    Science.gov (United States)

    El-Bathy, Naser Ibrahim

    2010-01-01

    The study of this dissertation provides a solution to a very specific problem instance in the area of data mining, data warehousing, and service-oriented architecture in publishing and newspaper industries. The research question focuses on the integration of data mining and data warehousing. The research problem focuses on the development of…

  11. Soil food web changes during spontaneous succession at post mining sites: a possible ecosystem engineering effect on food web organization?

    Science.gov (United States)

    Frouz, Jan; Thébault, Elisa; Pižl, Václav; Adl, Sina; Cajthaml, Tomáš; Baldrián, Petr; Háněl, Ladislav; Starý, Josef; Tajovský, Karel; Materna, Jan; Nováková, Alena; de Ruiter, Peter C

    2013-01-01

    Parameters characterizing the structure of the decomposer food web, biomass of the soil microflora (bacteria and fungi) and soil micro-, meso- and macrofauna were studied at 14 non-reclaimed 1- 41-year-old post-mining sites near the town of Sokolov (Czech Republic). These observations on the decomposer food webs were compared with knowledge of vegetation and soil microstructure development from previous studies. The amount of carbon entering the food web increased with succession age in a similar way as the total amount of C in food web biomass and the number of functional groups in the food web. Connectance did not show any significant changes with succession age, however. In early stages of the succession, the bacterial channel dominated the food web. Later on, in shrub-dominated stands, the fungal channel took over. Even later, in the forest stage, the bacterial channel prevailed again. The best predictor of fungal bacterial ratio is thickness of fermentation layer. We argue that these changes correspond with changes in topsoil microstructure driven by a combination of plant organic matter input and engineering effects of earthworms. In early stages, soil is alkaline, and a discontinuous litter layer on the soil surface promotes bacterial biomass growth, so the bacterial food web channel can dominate. Litter accumulation on the soil surface supports the development of the fungal channel. In older stages, earthworms arrive, mix litter into the mineral soil and form an organo-mineral topsoil, which is beneficial for bacteria and enhances the bacterial food web channel.

  12. Web Page Recommendation Models Theory and Algorithms

    CERN Document Server

    Gündüz-Ögüdücü, Sule

    2010-01-01

    One of the application areas of data mining is the World Wide Web (WWW or Web), which serves as a huge, widely distributed, global information service for every kind of information such as news, advertisements, consumer information, financial management, education, government, e-commerce, health services, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information, Web page access and usage information, providing sources for data mining. The amount of information on the Web is growing rapidly, as well as the number of Web sites and Web page

  13. Relation Based Mining Model for Enhancing Web Document Clustering

    Directory of Open Access Journals (Sweden)

    M.Reka

    2014-05-01

    Full Text Available The design of web Information management system becomes more complex one with more time complexity. Information retrieval is a difficult task due to the huge volume of web documents. The way of clustering makes the retrieval easier and less time consuming. Thisalgorithm introducesa web document clustering approach, which use the semantic relation between documents, which reduces the time complexity. It identifies the relations and concepts in a document and also computes the relation score between documents. This algorithm analyses the key concepts from the web documents by preprocessing, stemming, and stop word removal. Identified concepts are used to compute the document relation score and clusterrelation score. The domain ontology is used to compute the document relation score and cluster relation score. Based on the document relation score and cluster relation score, the web document cluster is identified. This algorithm uses 2,00,000 web documents for evaluation and 60 percentas trainingset and 40 percent as testing set.

  14. Text and Structural Data Mining of Influenza Mentions in Web and Social Media

    OpenAIRE

    Karan P Singh; Mikler, Armin R.; Cook, Diane J.; Courtney D. Corley

    2010-01-01

    Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like ill...

  15. The spread of scientific information: insights from the web usage statistics in PLoS article-level metrics.

    Science.gov (United States)

    Yan, Koon-Kiu; Gerstein, Mark

    2011-01-01

    The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

  16. The spread of scientific information: insights from the web usage statistics in PLoS article-level metrics.

    Directory of Open Access Journals (Sweden)

    Koon-Kiu Yan

    Full Text Available The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML. These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

  17. CURRENT USAGE OF COMPONENT BASED PRINCIPLES FOR DEVELOPING WEB APPLICATIONS WITH FRAMEWORKS: A LITERATURE REVIEW

    OpenAIRE

    Matija Novak; Ivan Švogor

    2016-01-01

    Component based software development has become a very popular paradigm in many software engineering branches. In the early phase of Web 2.0 appearance, it was also popular for web application development. From the analyzed papers, between this period and today, use of component based techniques for web application development was somewhat slowed down, however, the recent development indicates a comeback. Most of all it is apparent with W3C’s component web working group. In this article we wa...

  18. Text and structural data mining of influenza mentions in Web and social media.

    Science.gov (United States)

    Corley, Courtney D; Cook, Diane J; Mikler, Armin R; Singh, Karan P

    2010-02-01

    Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.

  19. Text and Structural Data Mining of Influenza Mentions in Web and Social Media

    Directory of Open Access Journals (Sweden)

    Karan P. Singh

    2010-02-01

    Full Text Available Text and structural data mining of web and social media (WSM provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.

  20. Web Video Object Mining: A Novel Approach for Knowledge Discovery

    Directory of Open Access Journals (Sweden)

    Siddu P. Algur

    2016-04-01

    Full Text Available The impact of social Medias such as YouTube, Twitter, and FaceBook etc on the modern world is led to huge growth in the size of video data over the cloud and web. The evolution of smart phones/Tabs could be one of the reasons for increasing in the rate of huge video data over the web. Due to the rapid evolution of web videos over the web, it is becoming difficult to identify popular, non-popular and average popular videos without watching the content of it. To cluster web videos based on their metadata into 'Popular', 'Non-Popular', and 'Average Popular' is one of the complex research questions for the Social Media and Computer Science researchers'. In this work, we propose two effective methods to cluster web videos based on their meta-objects. Large scale web video meta-objects such as- length, view counts, numbers of comments, rating information are considered for knowledge discovery process. The two clustering algorithms-Expectation Maximization (EM and Distribution Based (DB clustering are used to form three types of clusters. The resultant clusters are analyzed to find popular video cluster, average popular video cluster and non-popular video clusters. And also the results of EM and DB clusters are compared as a step in the process of knowledge discovery.

  1. Mining the Web: How Useful is the Global Public Library?

    Science.gov (United States)

    Albrecht, Rudolf; Boyce, Peter B.

    The Web has matured into the most universal source of information. At this point in time it still suffers from the fact that finding the pertinent information, even if available, is difficult for a variety of reasons. This paper explores the usefulness of the Web for professional scientists and the interested public. Using examples, we examine the reliability and completeness of the information on subjects that are well known, and on cutting edge science. A recent survey by the AAS, Supported in part by a grant from NASA to the AAS, found through standard Web search engines. Specialized services like the ADS are more

  2. Mining biological pathways using WikiPathways web services.

    Directory of Open Access Journals (Sweden)

    Thomas Kelder

    Full Text Available WikiPathways is a platform for creating, updating, and sharing biological pathways [1]. Pathways can be edited and downloaded using the wiki-style website. Here we present a SOAP web service that provides programmatic access to WikiPathways that is complementary to the website. We describe the functionality that this web service offers and discuss several use cases in detail. Exposing WikiPathways through a web service opens up new ways of utilizing pathway information and assisting the community curation process.

  3. Mining biological pathways using WikiPathways web services.

    Science.gov (United States)

    Kelder, Thomas; Pico, Alexander R; Hanspers, Kristina; van Iersel, Martijn P; Evelo, Chris; Conklin, Bruce R

    2009-07-30

    WikiPathways is a platform for creating, updating, and sharing biological pathways [1]. Pathways can be edited and downloaded using the wiki-style website. Here we present a SOAP web service that provides programmatic access to WikiPathways that is complementary to the website. We describe the functionality that this web service offers and discuss several use cases in detail. Exposing WikiPathways through a web service opens up new ways of utilizing pathway information and assisting the community curation process.

  4. Web mining based on chaotic social evolutionary programming algorithm

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    With an aim to the fact that the K-means clustering algorithm usually ends in local optimization and is hard to harvest global optimization, a new web clustering method is presented based on the chaotic social evolutionary programming (CSEP) algorithm. This method brings up the manner of that a cognitive agent inherits a paradigm in clustering to enable the cognitive agent to acquire a chaotic mutation operator in the betrayal. As proven in the experiment, this method can not only effectively increase web clustering efficiency, but it can also practically improve the precision of web clustering.

  5. Secure Association Rule Mining for Distributed Level Hierarchy in Web

    Directory of Open Access Journals (Sweden)

    Gulshan Shrivastava,

    2011-06-01

    Full Text Available Data mining technology can analyze massive data and it play very important role in many domains, if it used improperly it can also cause some new problem of information security. Thus severalprivacy preserving techniques for association rule mining have also been proposed in the past few years. Various algorithms have been developed for centralized data, while others refer to distributed data scenario. Distributed data Scenarios can also be classified as heterogeneous distributed data and homogenous distributed data and we identify that distributed data could be partitioned as horizontal partition (a.k.a. homogeneous distribution and vertical partition (a.k.a. heterogeneous distribution. In this paper, we propose an algorithm for secure association rule mining for vertical partition.

  6. Uncoolness factor of collaborative Web Mining Tools (WMT

    Directory of Open Access Journals (Sweden)

    Juan Luis Chulilla

    2009-12-01

    Full Text Available The recent development of social mining is a useful and direct analogy to talking about the less visible part of the adoption of successive waves of social software. The striking fact of visibility decrease as each type of social software matures should be taken into account for any comprehensive analysis of the relation between collectives and Internet technologies. One of the main results of this relation is the social data mining of Internet, which both gives sense to virtual communities and produces contents via feedback. We are just at the beginning of the adoption of new ways of social data mining, which will be significant when grow mature and become invisible.

  7. 基于Web的数据仓库与数据挖掘技术%Web-based Data Warehousing and Data Mining Technologies

    Institute of Scientific and Technical Information of China (English)

    刘云; 刘东苏

    2001-01-01

    The paper discusses the Web-based data warehousing and data mining technologies, analyzes the similarities and differences between the traditional data warehousing and data mining and the Web-based data warehousing and data mining. The architecture of the Web-based data warehousing is presented. The problems that should be solved are pointed out and some solutions are given.

  8. Combining Data Warehouse and Data Mining Techniques for Web Log Analysis

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Jespersen, Søren; Thorhauge, Jesper

    2008-01-01

    Enormous amounts of information about Web site user behavior are collected inWeb server logs. However, this information is only useful if it can be queried andanalyzed to provide high-level knowledge about user navigation patterns, a task thatrequires powerful techniques.This chapter presents a n...... a number of approaches thatcombine data warehousing and data mining techniques in order to analyze Web logs.After introducing the well-known click and session data warehouse (DW) schemas,the chapter presents the subsession schema, which allows fast queries on sequences...

  9. Análisis de sesiones de la web del Cindoc: una aproximación a la minería de uso web

    OpenAIRE

    Ortega-Priego, José-Luis

    2005-01-01

    This paper try an usability and navigability study of the Cindoc web site through web log files of the main server for october 2003. For this, web mining are used, concretly, web usage mining techniques to the detection of sessions with the aim of determine navigation patterns and design faults. Several design problems are detected in the navigation menu, in the layouth of the contents and in the web structure. Different navigation identificated patterns are discussed and many advices are ...

  10. Combining Data Warehouse and Data Mining Techniques for Web Log Analysis

    DEFF Research Database (Denmark)

    Pedersen, Torben Bach; Jespersen, Søren; Thorhauge, Jesper

    2008-01-01

    a number of approaches thatcombine data warehousing and data mining techniques in order to analyze Web logs.After introducing the well-known click and session data warehouse (DW) schemas,the chapter presents the subsession schema, which allows fast queries on sequences...

  11. HOW DO THEY BEHAVE ON THE WEB? AN EXPLORATORY STUDY OF MINING THE WEB FOR ANALYTICAL CUSTOMER

    Directory of Open Access Journals (Sweden)

    Myriam Ertz

    2015-06-01

    Full Text Available Web Mining (WM remains a relatively unknown technology. However, if used appropriately, it can be of great value to the identification of existing customers’ behaviours online. The recent technical advances in the field of WM enhance tremendously the analytical Customer Relationship Management (aCRM, still usually related to a simple transactional function. This study follows an exploratory approach to assess whether WM fulfills, alone, all three objectives of the second theme of Xu and Walton’s1 adapted aCRM framework for customer knowledge acquisition, namely the identification of existing web customers’ behaviour. It also investigates to what extent WM should be used in conjunction with traditional marketing research to optimize CRM, and hence marketing, in a web context. In-depth, semi-structured interviews reveal that WM is very well suited to understand existing web customers’ transactional web behaviour(s (i.e. navigation patterns; amount of purchases by week, month, and region; and cross-selling and up-selling opportunities. Nevertheless, WM does not do well in understanding less obvious, underlying dimensions of customer behaviour, including how existing customers develop satisfaction, loyalty, defection and attachment on the web. WM still needs to be complemented with traditional marketing research in order to reach these more difficult but essential aCRM objectives.

  12. Collaborative Framework with User Personalization for Efficient web Search : A D3 Mining approach

    Directory of Open Access Journals (Sweden)

    V.Vijayadeepa

    2011-04-01

    Full Text Available User personalization becomes more important task for web search engines. We develop a unified model to provide user personalization for efficient web search. We collect implicit feedback from the users by tracking their behavior on the web page based on their actions on the web page. We track actions like save, copy, bookmark, time spent and logging into data base, which will be used to build unified model. Our model is used as a collaborative framework using which related users can mine the information collaboratively with littleamount of time. Based on the feed back from the users we categorize the users and search query. We build the unified model based on the categorized information, using which we provide personalized results to the user during web search. Our methodology minimizes the search time and provides more amount of relevant information.

  13. Using an improved association rules mining optimization algorithm in web-based mobile-learning system

    Science.gov (United States)

    Huang, Yin; Chen, Jianhua; Xiong, Shaojun

    2009-07-01

    Mobile-Learning (M-learning) makes many learners get the advantages of both traditional learning and E-learning. Currently, Web-based Mobile-Learning Systems have created many new ways and defined new relationships between educators and learners. Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a serious problem which causes great concerns, as conventional mining algorithms often produce too many rules for decision makers to digest. Since Web-based Mobile-Learning System collects vast amounts of student profile data, data mining and knowledge discovery techniques can be applied to find interesting relationships between attributes of learners, assessments, the solution strategies adopted by learners and so on. Therefore ,this paper focus on a new data-mining algorithm, combined with the advantages of genetic algorithm and simulated annealing algorithm , called ARGSA(Association rules based on an improved Genetic Simulated Annealing Algorithm), to mine the association rules. This paper first takes advantage of the Parallel Genetic Algorithm and Simulated Algorithm designed specifically for discovering association rules. Moreover, the analysis and experiment are also made to show the proposed method is superior to the Apriori algorithm in this Mobile-Learning system.

  14. A WebGIS Decision Support System for Management of Abandoned Mines

    Directory of Open Access Journals (Sweden)

    Ranka Stanković

    2016-07-01

    Full Text Available This paper presents the development of a WebGIS application aimed at providing safe and reliable data needed for reclamation of abandoned mines in national parks and other protected areas in Vojvodina in compliance with existing legal regulations. The geodatabase model for this application has been developed using UML and the CASE tool Microsoft Visio featuring an interface with ArcGIS. The WebGIS application was developed using GeoServer, an open source tool in the Java programming language, with integrated PostgreSQL DB and the possibility of generating and publishing WMS, WFS and KML services. The WebGIS application is publicly available, based on an appropriate central database, which for the first time encompasses all available data on abandoned mines in Vojvodina, and as such may serve as a model for similar databases on the territory of the Republic of Serbia.

  15. Environment: General; Grammar & Usage; Money Management; Music History; Web Page Creation & Design.

    Science.gov (United States)

    Web Feet, 2001

    2001-01-01

    Describes Web site resources for elementary and secondary education in the topics of: environment, grammar, money management, music history, and Web page creation and design. Each entry includes an illustration of a sample page on the site and an indication of the grade levels for which it is appropriate. (AEF)

  16. Environment: General; Grammar & Usage; Money Management; Music History; Web Page Creation & Design.

    Science.gov (United States)

    Web Feet, 2001

    2001-01-01

    Describes Web site resources for elementary and secondary education in the topics of: environment, grammar, money management, music history, and Web page creation and design. Each entry includes an illustration of a sample page on the site and an indication of the grade levels for which it is appropriate. (AEF)

  17. What Are the Usage Conditions of Web 2.0 Tools Faculty of Education Students?

    Science.gov (United States)

    Agir, Ahmet

    2014-01-01

    As a result of advances in technology and then the emergence of using Internet in every step of life, web that provides access to the documents such as picture, audio, animation and text in Internet started to be used. At first, web consists of only visual and text pages that couldn't enable to make user's interaction. However, it is seen that not…

  18. Error Checking for Chinese Query by Mining Web Log

    Directory of Open Access Journals (Sweden)

    Jianyong Duan

    2015-01-01

    Full Text Available For the search engine, error-input query is a common phenomenon. This paper uses web log as the training set for the query error checking. Through the n-gram language model that is trained by web log, the queries are analyzed and checked. Some features including query words and their number are introduced into the model. At the same time data smoothing algorithm is used to solve data sparseness problem. It will improve the overall accuracy of the n-gram model. The experimental results show that it is effective.

  19. Business Intelligence: A Rapidly Growing Option through Web Mining

    OpenAIRE

    Rahi, Priyanka

    2012-01-01

    The World Wide Web is a popular and interactive medium to distribute information in this scenario. The web is huge, diverse, ever changing, widely disseminated global information service center. We are familiar with terms like e-commerce, e-governance, e-market, e-finance, e-learning, e-banking etc. for an organization it is new challenge to maintain direct contact with customers because of the rapid growth in e-commerce, e-publishing and electronic service delivery. To deal with this there i...

  20. Soil food web changes during spontaneous succession at post mining sites: a possible ecosystem engineering effect on food web organization?

    Directory of Open Access Journals (Sweden)

    Jan Frouz

    Full Text Available Parameters characterizing the structure of the decomposer food web, biomass of the soil microflora (bacteria and fungi and soil micro-, meso- and macrofauna were studied at 14 non-reclaimed 1- 41-year-old post-mining sites near the town of Sokolov (Czech Republic. These observations on the decomposer food webs were compared with knowledge of vegetation and soil microstructure development from previous studies. The amount of carbon entering the food web increased with succession age in a similar way as the total amount of C in food web biomass and the number of functional groups in the food web. Connectance did not show any significant changes with succession age, however. In early stages of the succession, the bacterial channel dominated the food web. Later on, in shrub-dominated stands, the fungal channel took over. Even later, in the forest stage, the bacterial channel prevailed again. The best predictor of fungal bacterial ratio is thickness of fermentation layer. We argue that these changes correspond with changes in topsoil microstructure driven by a combination of plant organic matter input and engineering effects of earthworms. In early stages, soil is alkaline, and a discontinuous litter layer on the soil surface promotes bacterial biomass growth, so the bacterial food web channel can dominate. Litter accumulation on the soil surface supports the development of the fungal channel. In older stages, earthworms arrive, mix litter into the mineral soil and form an organo-mineral topsoil, which is beneficial for bacteria and enhances the bacterial food web channel.

  1. Web Log Mining using Improved Version of Proposed Algorithm

    Directory of Open Access Journals (Sweden)

    Dr. Manish Shrivastava

    2011-12-01

    Full Text Available Association Rule mining is one of the important and most popular data mining technique. It extracts interesting correlations, frequent patterns and associations among sets of items in the transaction databases or other data repositories. Most of the existing algorithms require multiple passes over the database for discovering frequent patterns resulting in a large number of disk reads and placing a huge burden on the input/output subsystem. In order to reduce repetitive disk read, a novel method of top down approach is proposed in this paper. The improved version of Apriori Algorithm greatly reduces the data base scans and avoids generation of unnecessary patterns which reduces data base scan, time and space consumption.

  2. Keynote Talk: Mining the Web 2.0 for Improved Image Search

    Science.gov (United States)

    Baeza-Yates, Ricardo

    There are several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web usage data. Most of them are related to user generated content (UGC) or what is called today the Web 2.0. In this talk we show how to use these sources of evidence in Flickr, such as tags, visual annotations or clicks, which represent the the wisdom of crowds behind UGC, to improve image search. These results are the work of the multimedia retrieval team at Yahoo! Research Barcelona and they are already being used in Yahoo! image search. This work is part of a larger effort to produce a virtuous data feedback circuit based on the right combination many different technologies to leverage the Web itself.

  3. Comparison of Turkish and US Pre-Service Teachers' Web 2.0 Tools Usage Characteristics

    Science.gov (United States)

    Kiyici, Mubin; Akyeampong, Albert; Balkan Kiyici, Fatime

    2013-01-01

    As the Internet and computer develop, the world is changing dramatically and fantastically. Usage of technological tools is increased day by day in daily life besides ICT. All the technological tools shape individual behavior, life style and learning style as well as individual lives. Today's child use different tools and different way to…

  4. Mining Device-Specific Apps Usage Patterns from Large-Scale Android Users

    OpenAIRE

    Li, Huoran; Lu, Xuan

    2017-01-01

    When smartphones, applications (a.k.a, apps), and app stores have been widely adopted by the billions, an interesting debate emerges: whether and to what extent do device models influence the behaviors of their users? The answer to this question is critical to almost every stakeholder in the smartphone app ecosystem, including app store operators, developers, end-users, and network providers. To approach this question, we collect a longitudinal data set of app usage through a leading Android ...

  5. Mining Web-based Educational Systems to Predict Student Learning Achievements

    Directory of Open Access Journals (Sweden)

    José del Campo-Ávila

    2015-03-01

    Full Text Available Educational Data Mining (EDM is getting great importance as a new interdisciplinary research field related to some other areas. It is directly connected with Web-based Educational Systems (WBES and Data Mining (DM, a fundamental part of Knowledge Discovery in Databases. The former defines the context: WBES store and manage huge amounts of data. Such data are increasingly growing and they contain hidden knowledge that could be very useful to the users (both teachers and students. It is desirable to identify such knowledge in the form of models, patterns or any other representation schema that allows a better exploitation of the system. The latter reveals itself as the tool to achieve such discovering. Data mining must afford very complex and different situations to reach quality solutions. Therefore, data mining is a research field where many advances are being done to accommodate and solve emerging problems. For this purpose, many techniques are usually considered. In this paper we study how data mining can be used to induce student models from the data acquired by a specific Web-based tool for adaptive testing, called SIETTE. Concretely we have used top down induction decision trees algorithms to extract the patterns because these models, decision trees, are easily understandable. In addition, the conducted validation processes have assured high quality models.

  6. An Efficient Hybrid Algorithm for Mining Web Frequent Access Patterns

    Institute of Scientific and Technical Information of China (English)

    ZHAN Li-qiang; LIU Da-xin

    2004-01-01

    We propose an efficient hybrid algorithm WDHP in this paper for mining frequent access patterns.WDHP adopts the techniques of DHP to optimize its performance, which is using hash table to filter candidate set and trimming database.Whenever the database is trimmed to a size less than a specified threshold, the algorithm puts the database into main memory by constructing a tree, and finds frequent patterns on the tree.The experiment shows that WDHP outperform algorithm DHP and main memory based algorithm WAP in execution efficiency.

  7. Study and Implementation of Web Mining Classification Algorithm Based on Building Tree of Detection Class Threshold

    Institute of Scientific and Technical Information of China (English)

    CHEN Jun-jie; SONG Han-tao; LU Yu-chang

    2005-01-01

    A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4. 5 algorithm, the disadvantage of excessive adaptation in C4. 5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.

  8. Wireless sensing of gas in mining with web service in real time

    Directory of Open Access Journals (Sweden)

    Juan Mauricio Salamanca

    2014-12-01

    hierarchically in order to transmit the data to the entrance of the mine. Finally, the network configuration is done until the system enters in mode sleep (idle when it is not receiving information, in this way the consuming power decreased, increasing the autonomy of the batteries. This paper describes the design, implementation and operation of a gas monitoring system in mining with web service inreal-time based on a network of Zigbee sensors.

  9. Social big data mining

    CERN Document Server

    Ishikawa, Hiroshi

    2015-01-01

    Social Media. Big Data and Social Data. Hypotheses in the Era of Big Data. Social Big Data Applications. Basic Concepts in Data Mining. Association Rule Mining. Clustering. Classification. Prediction. Web Structure Mining. Web Content Mining. Web Access Log Mining, Information Extraction and Deep Web Mining. Media Mining. Scalability and Outlier Detection.

  10. What Is Different about E-Books? A MINES for Libraries® Analysis of Academic and Health Sciences Research Libraries' E-Book Usage

    Science.gov (United States)

    Plum, Terry; Franklin, Brinley

    2015-01-01

    Building on the theoretical proposals of Kevin Guthrie and others concerning the transition from print books to e-books in academic and health sciences libraries, this paper presents data collected using the MINES for Libraries® e-resource survey methodology. Approximately 6,000 e-book uses were analyzed from a sample of e-resource usage at…

  11. What Is Different about E-Books? A MINES for Libraries® Analysis of Academic and Health Sciences Research Libraries' E-Book Usage

    Science.gov (United States)

    Plum, Terry; Franklin, Brinley

    2015-01-01

    Building on the theoretical proposals of Kevin Guthrie and others concerning the transition from print books to e-books in academic and health sciences libraries, this paper presents data collected using the MINES for Libraries® e-resource survey methodology. Approximately 6,000 e-book uses were analyzed from a sample of e-resource usage at…

  12. Analyzing PACS Usage Patterns by Means of Process Mining: Steps Toward a More Detailed Workflow Analysis in Radiology.

    Science.gov (United States)

    Forsberg, Daniel; Rosipko, Beverly; Sunshine, Jeffrey L

    2016-02-01

    In this paper, statistical analysis and techniques from process mining are employed to analyze interaction patterns originating from radiologists reading medical images in a picture archiving and communication system (PACS). Event logs from 1 week of data, corresponding to 567 cases of single-view chest radiographs read by 14 radiologists, were analyzed. Statistical analysis showed that the numbers of commands and command types used by the radiologists per case only have a slightly positive correlation with the time to read a case (0.31 and 0.55, respectively). Further, one way ANOVA showed that the factors time of day, radiologist and specialty were significant for the number of commands per case, whereas radiologist was also significant for the number of command types, but with no significance of any of the factors on time to read. Applying process mining to the event logs of all users showed that a seemingly "simple" examination (single-view chest radiographs) can be associated with a highly complex interaction process. However, repeating the process discovery on each individual radiologist revealed that the initially discovered complex interaction process consists of one group of radiologists with individually well-structured interaction processes and a second smaller group of users with progressively more complex usage patterns. Future research will focus on metrics to describe derived interaction processes in order to investigate if one set of interaction patterns can be considered as more efficient than another set when reading radiological images in a PACS.

  13. Usage of Web Mapping Systems and Services for Information Support of Regional Management

    Directory of Open Access Journals (Sweden)

    Shaparev Nicolay

    2016-01-01

    Full Text Available The work considers information and computing technologies supporting regional decisions making and based on geoinformation websystems and mapping web-services. The use of such systems for the information support of regional management is now becoming common. Long-term strategic forecasting and planning of territories development, solution of various institutional and sectoral problems today are often based on the use of integrated information and computing environment, complex information systems of regional management, which are based on geospatial (mapping data. This paper discusses technologies and webservices used in the creation and implementation of regional geoinformation web-systems. Such systems provide access to huge arrays of geospatial information and services distributed in the Internet, remote data processing with high performance multi-user computers. Problems of choosing basic software such as geoinformation platform, advantages and disadvantages of existing solutions are discussed. The software structure and basic web-GIS components are analyzed. Examples of completed projects are given.

  14. Health care public reporting utilization - user clusters, web trails, and usage barriers on Germany's public reporting portal Weisse-Liste.de.

    Science.gov (United States)

    Pross, Christoph; Averdunk, Lars-Henrik; Stjepanovic, Josip; Busse, Reinhard; Geissler, Alexander

    2017-04-21

    Quality of care public reporting provides structural, process and outcome information to facilitate hospital choice and strengthen quality competition. Yet, evidence indicates that patients rarely use this information in their decision-making, due to limited awareness of the data and complex and conflicting information. While there is enthusiasm among policy makers for public reporting, clinicians and researchers doubt its overall impact. Almost no study has analyzed how users behave on public reporting portals, which information they seek out and when they abort their search. This study employs web-usage mining techniques on server log data of 17 million user actions from Germany's premier provider transparency portal Weisse-Liste.de (WL.de) between 2012 and 2015. Postal code and ICD search requests facilitate identification of geographical and treatment area usage patterns. User clustering helps to identify user types based on parameters like session length, referrer and page topic visited. First-level markov chains illustrate common click paths and premature exits. In 2015, the WL.de Hospital Search portal had 2,750 daily users, with 25% mobile traffic, a bounce rate of 38% and 48% of users examining hospital quality information. From 2013 to 2015, user traffic grew at 38% annually. On average users spent 7 min on the portal, with 7.4 clicks and 54 s between clicks. Users request information for many oncologic and orthopedic conditions, for which no process or outcome quality indicators are available. Ten distinct user types, with particular usage patterns and interests, are identified. In particular, the different types of professional and non-professional users need to be addressed differently to avoid high premature exit rates at several key steps in the information search and view process. Of all users, 37% enter hospital information correctly upon entry, while 47% require support in their hospital search. Several onsite and offsite improvement options are

  15. Investigation of Web Mining Optimization Using Microbial Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Dipali Tungar

    2014-02-01

    Full Text Available In today's modern internet era peopleneed searching on the web and finding relevant information on the web to be efficient and fast. But traditional search engines like Google suppose to be more intelligent, still use the traditional crawling algorithms to find data relevant to the search query. But most of the times it returns irrelevant data as well which becomes confusing for the user. In a normal XML data the user inputs the search query in terms of a keyword or a question and the answer to the search query should be more precise and more relevant. So, using the traditional crawling algorithms over XML data would lead to irrelevant results. Genetic algorithms are the modern algorithms which replicates the Darwinian theory of the natural evolution. The genetic algorithms are best suited for the traditional search problem as the genetic algorithms always tend to return quality as solution for any domain data. It would be a good approach to investigate how the genetic algorithms would be suitable for the search over the XML data of different domains. So, this system implements a steady state tournament selection Microbial Genetic Algorithm over the XML data of the different domains. This would be an investigation of how the genetic algorithm would return accurate results over XML data of different domains.

  16. An Introduction to Social Semantic Web Mining & Big Data Analytics for Political Attitudes and Mentalities Research

    Directory of Open Access Journals (Sweden)

    Markus Schatten

    2015-01-01

    Full Text Available The social web has become a major repository of social and behavioral data that is of exceptional interest to the social science and humanities research community. Computer science has only recently developed various technologies and techniques that allow for harvesting, organizing and analyzing such data and provide knowledge and insights into the structure and behavior or people on-line. Some of these techniques include social web mining, conceptual and social network analysis and modeling, tag clouds, topic maps, folksonomies, complex network visualizations, modeling of processes on networks, agent based models of social network emergence, speech recognition, computer vision, natural language processing, opinion mining and sentiment analysis, recommender systems, user profiling and semantic wikis. All of these techniques are briefly introduced, example studies are given and ideas as well as possible directions in the field of political attitudes and mentalities are given. In the end challenges for future studies are discussed.

  17. Competence and Usage of Web 2.0 Technologies by Higher Education Faculty

    Science.gov (United States)

    Soomro, Kamal Ahmed; Zai, Sajid Yousuf; Jafri, Iftikhar Hussain

    2015-01-01

    Literature on Web 2.0 experiences of higher education faculty in developing countries such as Pakistan is very limited. An insight on awareness and practices of higher education faculty with these tools can be helpful to map strategies and plan of action for adopting latest technologies to support teaching-learning processes in higher education of…

  18. Usage, Barriers, and Training of Web 2.0 Technology Applications

    Science.gov (United States)

    Pritchett, Christopher G.; Pritchett, Christal C.; Wohleb, Elisha C.

    2013-01-01

    This research study was designed to determine the degree of use of Web 2.0 technology applications by certified education professionals and examine differences among various groups as well as reasons for these differences. A quantitative survey instrument was developed to gather demographic information and data. Participants reported they would be…

  19. Usage and applications of Semantic Web techniques and technologies to support chemistry research.

    Science.gov (United States)

    Borkum, Mark I; Frey, Jeremy G

    2014-01-01

    The drug discovery process is now highly dependent on the management, curation and integration of large amounts of potentially useful data. Semantics are necessary in order to interpret the information and derive knowledge. Advances in recent years have mitigated concerns that the lack of robust, usable tools has inhibited the adoption of methodologies based on semantics. THIS PAPER PRESENTS THREE EXAMPLES OF HOW SEMANTIC WEB TECHNIQUES AND TECHNOLOGIES CAN BE USED IN ORDER TO SUPPORT CHEMISTRY RESEARCH: a controlled vocabulary for quantities, units and symbols in physical chemistry; a controlled vocabulary for the classification and labelling of chemical substances and mixtures; and, a database of chemical identifiers. This paper also presents a Web-based service that uses the datasets in order to assist with the completion of risk assessment forms, along with a discussion of the legal implications and value-proposition for the use of such a service. We have introduced the Semantic Web concepts, technologies, and methodologies that can be used to support chemistry research, and have demonstrated the application of those techniques in three areas very relevant to modern chemistry research, generating three new datasets that we offer as exemplars of an extensible portfolio of advanced data integration facilities. We have thereby established the importance of Semantic Web techniques and technologies for meeting Wild's fourth "grand challenge".

  20. Usage, attitudes and workload implications for a Web-based learning environment

    Directory of Open Access Journals (Sweden)

    Betty Collis

    2001-12-01

    Full Text Available The tools and features of Web-based course-management systems vary (see http:llwww.ctt.bc, callandonlinel, for an analysis and comparison of several commercially available systems but typically include tools to support the organization of the course, tools to support communication, tools to support student activities such as submission of assignments and collaborative work, and 'back-office' tools to handle user registration, maintenance of user data, and, in some systems, tools to tailor the view of a course site made available to a registered user (Robson, 1999. Although many tools and features are available in such Web-based learning environments, it is not the case that instructors necessarily make use of all these tools and features. Rankin (2000 for example, notes that 'most instructors have failed to take full advantage of the growing resources available to them online'. Rankin suggests that the creation and incorporation of templates into course Web sites could be a strategy to provide instructors with a simple and effective way of developing their Web-accessible materials. Such templates are the basic building blocks of the TeleTOP learning environment used at the University of Twente.

  1. Mining Data from ISI Web of Science® Reports

    Directory of Open Access Journals (Sweden)

    Alfred Kraemer

    2008-09-01

    Full Text Available Journal citation data is valuable as a selection tool for adding new journals as well as for discontinuing subscriptions that are no longer cost-effective. This article presents and discusses an example of data extraction from a typical ISI Web of Science report. The strategy was developed following a review of the data relationships and embedded data output format. While Perl was used in the example, the method described can be implemented with most programming/scripting languages. The example demonstrates also that citation-based studies and reports can be based on large sets of extracted data rather than the typical, small samples. The value of the data is discussed using a actual decision-making scenario.

  2. A Novel Approach for Social Network Analysis & Web Mining for Counter Terrorism

    Directory of Open Access Journals (Sweden)

    Prof. G. A. Patil

    2012-11-01

    Full Text Available Terrorists and extremists are increasingly utilizing Internet technology as an effective mode to enhance their ability to influence the outside world. Lack of multilingual and multimediaterrorist/extremist collections and advanced analytical methodologies; limit our experiential understanding of their Internet usage. To address this research gap, we explore an integrated approachfor identifying and collecting terrorist/extremist Web contents and to discover hidden relationships among communities. It has been shown in the literature that content analysis gives more insight of technical sophistication, content richness; whereas the link analysis focuses on the web interactivity. A dark web attribute system has made the sincere effort on identifying and comparing terrorist website with genuine web sites by using content and link analysis still there is scope in the same area as proposed in [1]. This proposed work focuses on identifying & analyzing new web page attributes. It is aimed to compare different terrorist/extremist sites with genuine sites and accordingly prepare metrics which canbe further used for identification of other sites of terrorist/extremist groups. Also proposed work focus on to visualize and analyze hidden domestic terrorism communities and intercommunity relationships among all web sites in our collection.

  3. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining.

    Science.gov (United States)

    Sadesh, S; Suganthe, R C

    2015-01-01

    Web with tremendous volume of information retrieves result for user related queries. With the rapid growth of web page recommendation, results retrieved based on data mining techniques did not offer higher performance filtering rate because relationships between user profile and queries were not analyzed in an extensive manner. At the same time, existing user profile based prediction in web data mining is not exhaustive in producing personalized result rate. To improve the query result rate on dynamics of user behavior over time, Hamilton Filtered Regime Switching User Query Probability (HFRS-UQP) framework is proposed. HFRS-UQP framework is split into two processes, where filtering and switching are carried out. The data mining based filtering in our research work uses the Hamilton Filtering framework to filter user result based on personalized information on automatic updated profiles through search engine. Maximized result is fetched, that is, filtered out with respect to user behavior profiles. The switching performs accurate filtering updated profiles using regime switching. The updating in profile change (i.e., switches) regime in HFRS-UQP framework identifies the second- and higher-order association of query result on the updated profiles. Experiment is conducted on factors such as personalized information search retrieval rate, filtering efficiency, and precision ratio.

  4. Beyond accuracy: creating interoperable and scalable text-mining web services.

    Science.gov (United States)

    Wei, Chih-Hsuan; Leaman, Robert; Lu, Zhiyong

    2016-06-15

    The biomedical literature is a knowledge-rich resource and an important foundation for future research. With over 24 million articles in PubMed and an increasing growth rate, research in automated text processing is becoming increasingly important. We report here our recently developed web-based text mining services for biomedical concept recognition and normalization. Unlike most text-mining software tools, our web services integrate several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g. scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have preprocessed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text. Our text-mining web service is freely available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#curl : Zhiyong.Lu@nih.gov. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

  5. An Empirical Study of the Applications of Web Mining Techniques in Health Care

    Directory of Open Access Journals (Sweden)

    Dr. Varun Kumar

    2011-10-01

    Full Text Available Few years ago, the information flow in health care field was relatively simple and the application of technology was limited. However, as we progress into a more integrated world where technology has become an integral part of the business processes, the process of transfer of information has become more complicated. There has already been a long standing tradition for computer-based decision support, dealing with complex problems in medicine such as diagnosing disease, managerial decisions and assisting in the prescription of appropriate treatment. Today, one of the biggest challenges that health care system, face is the explosive growth of data, use this data to improve the quality of managerial decisions. Web mining and Data mining techniques are analytical tools that can be used to extract meaningful knowledge from large data sets. This paper addresses the applications of web mining and data mining in health care management system to extract useful information from the huge data sets and providing analytical tool to view and use this information for decision making processes by taking real life examples. Further we propose the IDSS model for the health care so that exact and accurate decision can be taken for the removal of a particular disease.

  6. The ALMA Data Mining Toolkit I: Archive Setup and User Usage

    Science.gov (United States)

    Teuben, P.; Pound, M.; Mundy, L.; Looney, L.; Friedel, D. N.

    2014-05-01

    We report on an ALMA development study and project where we employ a novel approach to add data and data descriptors to ALMA archive data and allowing further flexible data mining on retrieved data. We call our toolkit ADMIT (the ALMA Data Mining Toolkit) that works within the Python based CASA environment. What is described here is a design study, with some exiting toy code to prove the concept. After ingestion of science ready datacubes, ADMIT will compute a number of basic and advanced data products, and their descriptors. Examples of such data products are cube statistics, line identification tables, line cubes, moment maps, an integrated spectrum, overlap integrals and feature extraction tables. Together with a descriptive XML file, a small number of visual aids are added to a ZIP file that is deposited into the archive. Large datasets (such as line cubes) will have to be rederived by the user once they have also downloaded the actual ALMA Data Products, or via VO services if available. ADMIT enables the user to rederive all its products with different methods and parameters, and compare archive product with their own.

  7. Study on Web Usage Mining%Web使用记录挖掘的研究

    Institute of Scientific and Technical Information of China (English)

    谭营军; 李翠霞

    2005-01-01

    Internet的发展给传统的数据挖掘领域提出了很多新的研究课题. Web挖掘技术就是传统的数据挖掘技术与计算机网络技术的结合. Web使用记录挖掘是从日志文件中挖掘出有用的信息, 这些信息可以帮助站点设计者设计站点和服务, 有益于商业网站开展有针对性的电子商务活动. 介绍了Web挖掘的概念和分类, 说明了Web使用记录挖掘的过程和意义, 并指出了Web使用记录挖掘的研究趋势.

  8. Web Usage Mining Analysis of Federated Search Tools for Egyptian Scholars

    Science.gov (United States)

    Mohamed, Khaled A.; Hassan, Ahmed

    2008-01-01

    Purpose: This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to provide guidance for federated search tool technicians and support teams about user issues, including the need for training. Design/methodology/approach: Log…

  9. Web Usage Mining Analysis of Federated Search Tools for Egyptian Scholars

    Science.gov (United States)

    Mohamed, Khaled A.; Hassan, Ahmed

    2008-01-01

    Purpose: This paper aims to examine the behaviour of the Egyptian scholars while accessing electronic resources through two federated search tools. The main purpose of this article is to provide guidance for federated search tool technicians and support teams about user issues, including the need for training. Design/methodology/approach: Log…

  10. A REAL-TIME C-V CLUSTERING ALGORITHM FOR WEB-MINING

    Institute of Scientific and Technical Information of China (English)

    Li Haiying; Zhuang Zhenquan; Li Bin; Wan Ke

    2002-01-01

    In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.

  11. Studies on ICT Usage in the Academic CampusUsing Educational Data Mining

    Directory of Open Access Journals (Sweden)

    Ajay Auddy

    2014-06-01

    Full Text Available Inthe era of competition, change and complexity, innovation in teaching and learning practices in higher education sector has become unavoidable criteria.One of the biggest challenges that higher education system faces today is to assessthe services provided through Information and Communication Technology(ICT facilities installed in the campus. This paper studies the responses collected through survey on ICT, in the campusof the University of Burdwan, among the students and research scholarswith the help ofan effective data mining methodology - Variable Consistency Dominance-based Rough Set Approach (VC-DRSA model to extract meaningful knowledge to improve the quality of managerial decisions in this sphere. It is an extended version of Dominance Rough Set Approach (DRSA and is applied here to generate a set of recommendations that can help the university to improvise the existing services and augmenting the boundaries of ICT in future development.

  12. Clustering Web Documents based on Efficient Multi-Tire Hashing Algorithm for Mining Frequent Termsets

    Directory of Open Access Journals (Sweden)

    Noha Negm

    2013-06-01

    Full Text Available Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT. It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.

  13. The Effects of Web 2.0 Technologies Usage in Programming Languages Lesson on the Academic Success, Interrogative Learning Skills and Attitudes of Students towards Programming Languages

    Science.gov (United States)

    Gençtürk, Abdullah Tarik; Korucu, Agah Tugrul

    2017-01-01

    It is observed that teacher candidates receiving education in the department of Computer and Instructional Technologies Education are not able to gain enough experience and knowledge in "Programming Languages" lesson. The goal of this study is to analyse the effects of web 2.0 technologies usage in programming languages lesson on the…

  14. A Study on Information Search and Commitment Strategies on Web Environment and Internet Usage Self-Efficacy Beliefs of University Students'

    Science.gov (United States)

    Geçer, Aynur Kolburan

    2014-01-01

    This study addresses university students' information search and commitment strategies on web environment and internet usage self-efficacy beliefs in terms of such variables as gender, department, grade level and frequency of internet use; and whether there is a significant relation between these beliefs. Descriptive method was used in the study.…

  15. From Cookies to Cooks: Insights on Dietary Patterns via Analysis of Web Usage Logs

    CERN Document Server

    West, Robert; Horvitz, Eric

    2013-01-01

    Nutrition is a key factor in people's overall health. Hence, understanding the nature and dynamics of population-wide dietary preferences over time and space can be valuable in public health. To date, studies have leveraged small samples of participants via food intake logs or treatment data. We propose a complementary source of population data on nutrition obtained via Web logs. Our main contribution is a spatiotemporal analysis of population-wide dietary preferences through the lens of logs gathered by a widely distributed Web-browser add-on, using the access volume of recipes that users seek via search as a proxy for actual food consumption. We discover that variation in dietary preferences as expressed via recipe access has two main periodic components, one yearly and the other weekly, and that there exist characteristic regional differences in terms of diet within the United States. In a second study, we identify users who show evidence of having made an acute decision to lose weight. We characterize the...

  16. Analysis of Usage Patterns in Large Multimedia Websites

    Science.gov (United States)

    Singh, Rahul; Bhattarai, Bibek

    User behavior in a website is a critical indicator of the web site's usability and success. Therefore an understanding of usage patterns is essential to website design optimization. In this context, large multimedia websites pose a significant challenge for comprehension of the complex and diverse user behaviors they sustain. This is due to the complexity of analyzing and understanding user-data interactions in media-rich contexts. In this chapter we present a novel multi-perspective approach for usability analysis of large media rich websites. Our research combines multimedia web content analysis with elements of web-log analysis and visualization/visual mining of web usage metadata. Multimedia content analysis allows direct estimation of the information-cues presented to a user by the web content. Analysis of web logs and usage-metadata, such as location, type, and frequency of interactions provides a complimentary perspective on the site's usage. The entire set of information is leveraged through powerful visualization and interactive querying techniques to provide analysis of usage patterns, measure of design quality, as well as the ability to rapidly identify problems in the web-site design. Experiments on media rich sites including the SkyServer - a large multimedia web-based astronomy information repository demonstrate the efficacy and promise of the proposed approach.

  17. SOME APPROACHES TO TEXT MINING AND THEIR POTENTIAL FOR SEMANTIC WEB APPLICATIONS

    Directory of Open Access Journals (Sweden)

    Jan Paralič

    2007-06-01

    Full Text Available In this paper we describe some approaches to text mining, which are supported by an original software system developed in Java for support of information retrieval and text mining (JBowl, as well as its possible use in a distributed environment. The system JBowl1 is being developed as an open source software with the intention to provide an easily extensible, modular framework for pre-processing, indexing and further exploration of large text collections. The overall architecture of the system is described, followed by some typical use case scenarios, which have been used in some previous projects. Then, basic principles and technologies used for service-oriented computing, web services and semantic web services are presented. We further discuss how the JBowl system can be adopted into a distributed environment via technologies available already and what benefits can bring such an adaptation. This is in particular important in the context of a new integrated EU-funded project KP-Lab2 (Knowledge Practices Laboratory that is briefly presented as well as the role of the proposed text mining services, which are currently being designed and developed there.

  18. XGraphticsCLUS: Web Mining Hyperlinks and Content of Terrorism websites for Homeland Security

    Directory of Open Access Journals (Sweden)

    Dr.S.K.Jayanthi

    2011-05-01

    Full Text Available World Wide Web has become one of the best and fast communication media and information could be distributed within few seconds to the world day by day. The evolution of social networking media increases it further more to transfer information in a rapid speed to common people. Terrorism organizations utilize these facets of the web in very efficient manner for their destructive plans. Understanding web data is a decisive task to assure the better perceptive of a website. This paper focuses on the content and link structure mining of the website which was suspicious through XGraphticsCLUS. This is done through viewing the web as graph and retrieving the various content of the website. This could help in terms of better understanding the motto and various other web connections in the suspicion. The navigational links offered in the particular website could leave with some informative evidence. This paper puts a step towards the national security and provides the user a good perception.

  19. Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approach

    Directory of Open Access Journals (Sweden)

    Vishal Jain1

    2014-10-01

    Full Text Available A large amount of data is present on the web. It contains huge number of web pages and to find suitable information from them is very cumbersome task. There is need to organize data in formal manner so that user can easily access and use them. To retrieve information from documents, there are many Information Retrieval (IR techniques. Current IR techniques are not so advanced that they can be able to exploit semantic knowledge within documents and give precise results. IR technology is major factor responsible for handling annotations in Semantic Web (SW languages. With the rate of growth of web and huge amount of information available on the web which may be in unstructured, semi structured or structured form, it has become increasingly difficult to identify the relevant pieces of information on the internet. IR technology is major factor responsible for handling annotations in Semantic Web (SW languages. Knowledgeable representation languages are used for retrieving information. So, there is need to build an ontology that uses well defined methodology and process of developing ontology is called Ontology Development. Secondly, Cloud computing and data mining have become famous phenomena in the current application of information technology. With the changing trends and emerging of the new concept in the information technology sector, data mining and knowledge discovery have proved to be of significant importance. Data mining can be defined as the process of extracting data or information from a database which is not explicitly defined by the database and can be used to come up with generalized conclusions based on the trends obtained from the data. A database may be described as a collection of formerly structured data. Multi agents data mining may be defined as the use of various agents cooperatively interact with the environment to achieve a specified objective. Multi agents will always act on behalf of users and will coordinate, cooperate

  20. 企业网站的Web使用挖掘%Web Usage Mining of Enterprises Web Site

    Institute of Scientific and Technical Information of China (English)

    张春明

    2008-01-01

    本文从介绍企业网站的web使用挖掘的意义入手,着重分析了web使用挖掘的数据源,探讨了Web使用挖掘的常用技术,以及Web使用挖掘的过程,企业网站的Web使用挖掘应具备的功能.

  1. Deploying mutation impact text-mining software with the SADI Semantic Web Services framework.

    Science.gov (United States)

    Riazanov, Alexandre; Laurila, Jonas Bergman; Baker, Christopher J O

    2011-01-01

    Mutation impact extraction is an important task designed to harvest relevant annotations from scientific documents for reuse in multiple contexts. Our previous work on text mining for mutation impacts resulted in (i) the development of a GATE-based pipeline that mines texts for information about impacts of mutations on proteins, (ii) the population of this information into our OWL DL mutation impact ontology, and (iii) establishing an experimental semantic database for storing the results of text mining. This article explores the possibility of using the SADI framework as a medium for publishing our mutation impact software and data. SADI is a set of conventions for creating web services with semantic descriptions that facilitate automatic discovery and orchestration. We describe a case study exploring and demonstrating the utility of the SADI approach in our context. We describe several SADI services we created based on our text mining API and data, and demonstrate how they can be used in a number of biologically meaningful scenarios through a SPARQL interface (SHARE) to SADI services. In all cases we pay special attention to the integration of mutation impact services with external SADI services providing information about related biological entities, such as proteins, pathways, and drugs. We have identified that SADI provides an effective way of exposing our mutation impact data such that it can be leveraged by a variety of stakeholders in multiple use cases. The solutions we provide for our use cases can serve as examples to potential SADI adopters trying to solve similar integration problems.

  2. Cluo: Web-Scale Text Mining System For Open Source Intelligence Purposes

    Directory of Open Access Journals (Sweden)

    Przemyslaw Maciolek

    2013-01-01

    Full Text Available The amount of textual information published on the Internet is considered tobe in billions of web pages, blog posts, comments, social media updates andothers. Analyzing such quantities of data requires high level of distribution –both data and computing. This is especially true in case of complex algorithms,often used in text mining tasks.The paper presents a prototype implementation of CLUO – an Open SourceIntelligence (OSINT system, which extracts and analyzes significant quantitiesof openly available information.

  3. Web services-based text-mining demonstrates broad impacts for interoperability and process simplification.

    Science.gov (United States)

    Wiegers, Thomas C; Davis, Allan Peter; Mattingly, Carolyn J

    2014-01-01

    The Critical Assessment of Information Extraction systems in Biology (BioCreAtIvE) challenge evaluation tasks collectively represent a community-wide effort to evaluate a variety of text-mining and information extraction systems applied to the biological domain. The BioCreative IV Workshop included five independent subject areas, including Track 3, which focused on named-entity recognition (NER) for the Comparative Toxicogenomics Database (CTD; http://ctdbase.org). Previously, CTD had organized document ranking and NER-related tasks for the BioCreative Workshop 2012; a key finding of that effort was that interoperability and integration complexity were major impediments to the direct application of the systems to CTD's text-mining pipeline. This underscored a prevailing problem with software integration efforts. Major interoperability-related issues included lack of process modularity, operating system incompatibility, tool configuration complexity and lack of standardization of high-level inter-process communications. One approach to potentially mitigate interoperability and general integration issues is the use of Web services to abstract implementation details; rather than integrating NER tools directly, HTTP-based calls from CTD's asynchronous, batch-oriented text-mining pipeline could be made to remote NER Web services for recognition of specific biological terms using BioC (an emerging family of XML formats) for inter-process communications. To test this concept, participating groups developed Representational State Transfer /BioC-compliant Web services tailored to CTD's NER requirements. Participants were provided with a comprehensive set of training materials. CTD evaluated results obtained from the remote Web service-based URLs against a test data set of 510 manually curated scientific articles. Twelve groups participated in the challenge. Recall, precision, balanced F-scores and response times were calculated. Top balanced F-scores for gene, chemical and

  4. Socio-contextual Network Mining for User Assistance in Web-based Knowledge Gathering Tasks

    Science.gov (United States)

    Rajendran, Balaji; Kombiah, Iyakutti

    Web-based Knowledge Gathering (WKG) is a specialized and complex information seeking task carried out by many users on the web, for their various learning, and decision-making requirements. We construct a contextual semantic structure by observing the actions of the users involved in WKG task, in order to gain an understanding of their task and requirement. We also build a knowledge warehouse in the form of a master Semantic Link Network (SLX) that accommodates and assimilates all the contextual semantic structures. This master SLX, which is a socio-contextual network, is then mined to provide contextual inputs to the current users through their agents. We validated our approach through experiments and analyzed the benefits to the users in terms of resource explorations and the time saved. The results are positive enough to motivate us to implement in a larger scale.

  5. Minería Web: un recurso insoslayable para el profesional de la información

    OpenAIRE

    Fuentes Reyes, Sady C.; Ruíz Lobaina, Marina

    2007-01-01

    The main concepts related to Web mining are studied, and emphasis is made on the Web usage mining. The results obtained with the application of the Sawmill V.7.0 tool, which is used for processing Log files, are made known.

  6. KnoE: A Web Mining Tool to Validate Previously Discovered Semantic Correspondences

    Institute of Scientific and Technical Information of China (English)

    Jorge Martinez-Gil; José F.Aldana-Montes

    2012-01-01

    The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately.Nowadays there are a lot of techniques and tools for addressing this problem,however,the complex nature of the matching problem make existing solutions for real situations not fully satisfactory.The Google Similarity Distance has appeared recently.Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions.Our work consists of developing a software application for validating results discovered by schema and ontology matching tools using the philosophy behind this distance.Moreover,we are interested in using not only Google,but other popular search engines with this similarity distance.The results reveal three main facts.Firstly,some web search engines can help us to validate semantic correspondences satisfactorily.Secondly there are significant differences among the web search engines.And thirdly the best results are obtained when using combinations of the web search engines that we have studied.

  7. Web API Recommendation Based on User Usage History and Reputation Evaluation%基于用户使用历史与信誉评价的Web API推荐

    Institute of Scientific and Technical Information of China (English)

    曹步清; 刘建勋; 唐明董; 谢芬方

    2015-01-01

    With the release of more and more Web API services on Internet,it becomes a challenging research problem that how to recommend Web APIs that developer user are interested in and reputation degrees are high,to construct high quality and trustworthy software service system. This paper presents Web API service recommendation approach based on user usage history and reputation evaluation ( WASR) . It computes the similarity between user history records and Web API services,and gets user interest degree. Service reputation degree is computed by considering the user score of Web API,the score contributions of those Mashup services calling the Web API, and traffic flow of Web API based on statistical data by Alexa. It ranks and recommends Web API services according to the user interest degree and service reputation degree of Web APIs. Experimental results show that this approach can recommend Web API services with higher DCG of user interest degree than those of SR-based approach,and higher DCG of service reputation degree than those of UI-based approach.%随着网络上发布的Web API服务越来越多,如何推荐给开发者用户感兴趣、信誉度高的Web API服务,以构建高质量高可信的软件服务系统,成为一个具有挑战性的研究问题。为此,提出一种基于用户使用历史与信誉评价的Web API服务推荐方法。计算用户使用历史记录与Web API之间的相似度,获得Web API的用户兴趣值。综合用户的Web API评分,调用Web API的Mashup服务的评价贡献和Alexa统计的Web API访问流量,获得Web API的信誉评价值。根据Web API的用户兴趣值以及信誉评价值,实现Web API的排名与推荐。实验结果表明,该方法推荐的Web API用户兴趣度DCG值高于SR-Based方法,服务信誉度DCG值高于UI-Based方法。

  8. A Web-Based GIS for Reporting Water Usage in the High Plains Underground Water Conservation District

    Science.gov (United States)

    Jia, M.; Deeds, N.; Winckler, M.

    2012-12-01

    The High Plains Underground Water Conservation District (HPWD) is the largest and oldest of the Texas water conservation districts, and oversees approximately 1.7 million irrigated acres. Recent rule changes have motivated HPWD to develop a more automated system to allow owners and operators to report well locations, meter locations, meter readings, the association between meters and wells, and contiguous acres. INTERA, Inc. has developed a web-based interactive system for HPWD water users to report water usage and for the district to better manage its water resources. The HPWD web management system utilizes state-of-the-art GIS techniques, including cloud-based Amazon EC2 virtual machine, ArcGIS Server, ArcSDE and ArcGIS Viewer for Flex, to support web-based water use management. The system enables users to navigate to their area of interest using a well-established base-map and perform a variety of operations and inquiries against their spatial features. The application currently has six components: user privilege management, property management, water meter registration, area registration, meter-well association and water use report. The system is composed of two main databases: spatial database and non-spatial database. With the help of Adobe Flex application at the front end and ArcGIS Server as the middle-ware, the spatial feature geometry and attributes update will be reflected immediately in the back end. As a result, property owners, along with the HPWD staff, collaborate together to weave the fabric of the spatial database. Interactions between the spatial and non-spatial databases are established by Windows Communication Foundation (WCF) services to record water-use report, user-property associations, owner-area associations, as well as meter-well associations. Mobile capabilities will be enabled in the near future for field workers to collect data and synchronize them to the spatial database. The entire solution is built on a highly scalable cloud

  9. The Application of Web Mining Based on Web Crawler%基于网络爬虫的Web挖掘应用

    Institute of Scientific and Technical Information of China (English)

    胡晟

    2012-01-01

      本文首先分析了 Web 挖掘的实际必要性,介绍了数据挖掘体系结构中的关键技术和运行原理。并且介绍了一般爬虫所实现的功能,在此之上给出了一种网络爬虫设计,重点论述了此爬虫的原理、实现、性能以及该爬虫的优点。最后,经实验证明,设计的爬虫能够高效地获取互联网上的各种信息资源。%  This article firstly analyze the importance of applications of web-mining. Then,the key technical and operational prin-ciples of the architecture in web information search and data-mining are introduced. after the analysis of the functions of ordinary web Crawlers,the principle,implementation,functions and performance of designed web Crawler are elaborated. at last,passed tests show that the web Crawler can effectively access a good range of information on the world wide web resources.

  10. SalanderMaps: A rapid overview about felt earthquakes through data mining of web-accesses

    Science.gov (United States)

    Kradolfer, Urs

    2013-04-01

    While seismological observatories detect and locate earthquakes based on measurements of the ground motion, they neither know a priori whether an earthquake has been felt by the public nor is it known, where it has been felt. Such information is usually gathered by evaluating feedback reported by the public through on-line forms on the web. However, after a felt earthquake in Switzerland, many people visit the webpages of the Swiss Seismological Service (SED) at the ETH Zurich and each such visit leaves traces in the logfiles on our web-servers. Data mining techniques, applied to these logfiles and mining publicly available data bases on the internet open possibilities to obtain previously unknown information about our virtual visitors. In order to provide precise information to authorities and the media, it would be desirable to rapidly know from which locations these web-accesses origin. The method 'Salander' (Seismic Activitiy Linked to Area codes - Nimble Detection of Earthquake Rumbles) will be introduced and it will be explained, how the IP-addresses (each computer or router directly connected to the internet has a unique IP-address; an example would be 129.132.53.5) of a sufficient amount of our virtual visitors were linked to their geographical area. This allows us to unprecedentedly quickly know whether and where an earthquake was felt in Switzerland. It will also be explained, why the method Salander is superior to commercial so-called geolocation products. The corresponding products of the Salander method, animated SalanderMaps, which are routinely generated after each earthquake with a magnitude of M>2 in Switzerland (http://www.seismo.ethz.ch/prod/salandermaps/, available after March 2013), demonstrate how the wavefield of earthquakes propagates through Switzerland and where it was felt. Often, such information is available within less than 60 seconds after origin time, and we always get a clear picture within already five minutes after origin time

  11. Geovisualization of Local and Regional Migration Using Web-mined Demographics

    Science.gov (United States)

    Schuermann, R. T.; Chow, T. E.

    2014-11-01

    The intent of this research was to augment and facilitate analyses, which gauges the feasibility of web-mined demographics to study spatio-temporal dynamics of migration. As a case study, we explored the spatio-temporal dynamics of Vietnamese Americans (VA) in Texas through geovisualization of mined demographic microdata from the World Wide Web. Based on string matching across all demographic attributes, including full name, address, date of birth, age and phone number, multiple records of the same entity (i.e. person) over time were resolved and reconciled into a database. Migration trajectories were geovisualized through animated sprites by connecting the different addresses associated with the same person and segmenting the trajectory into small fragments. Intra-metropolitan migration patterns appeared at the local scale within many metropolitan areas. At the scale of metropolitan area, varying degrees of immigration and emigration manifest different types of migration clusters. This paper presents a methodology incorporating GIS methods and cartographic design to produce geovisualization animation, enabling the cognitive identification of migration patterns at multiple scales. Identification of spatio-temporal patterns often stimulates further research to better understand the phenomenon and enhance subsequent modeling.

  12. Mining web-based data to assess public response to environmental events.

    Science.gov (United States)

    Cha, YoonKyung; Stow, Craig A

    2015-03-01

    We explore how the analysis of web-based data, such as Twitter and Google Trends, can be used to assess the social relevance of an environmental accident. The concept and methods are applied in the shutdown of drinking water supply at the city of Toledo, Ohio, USA. Toledo's notice, which persisted from August 1 to 4, 2014, is a high-profile event that directly influenced approximately half a million people and received wide recognition. The notice was given when excessive levels of microcystin, a byproduct of cyanobacteria blooms, were discovered at the drinking water treatment plant on Lake Erie. Twitter mining results illustrated an instant response to the Toledo incident, the associated collective knowledge, and public perception. The results from Google Trends, on the other hand, revealed how the Toledo event raised public attention on the associated environmental issue, harmful algal blooms, in a long-term context. Thus, when jointly applied, Twitter and Google Trend analysis results offer complementary perspectives. Web content aggregated through mining approaches provides a social standpoint, such as public perception and interest, and offers context for establishing and evaluating environmental management policies. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches

    DEFF Research Database (Denmark)

    Svenstrup, Dan Tito; Jørgensen, Henrik L; Winther, Ole

    2015-01-01

    on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description...... in technology and access to high quality data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise....

  14. WEB-Based Mining in the Personalized Recommendation System%个性化推荐推荐系统中基于WEB的挖掘

    Institute of Scientific and Technical Information of China (English)

    汪彦红; 杨波; 胡玉鹏

    2011-01-01

    Due to the rapid development and wide applications, Internet has led to the information explosion on the WEB. It becomes research hotspots at present how to design effective algorithms and systems based on the technology of WEB Usage Mining. In this paper, we develop a recommendation system called WRS (Web Recommendation System), which is based on the application of WEB. In WRS, we propose a novel algorithm that makes use of the technology of image segmentation to cluster access modules, and adopts the parallel longest common subsequence algorithm to discern users' behaviors. Theoretical analysis and laboratory result show that our system is more effective and the recommendation performance is improved after using the new method.%Internet的普及和应用带来了WEB上的信息爆炸,如何基于WEB挖掘技术设计有效的信息推荐算法和推荐系统成为当前的研究热点.开发了一种基于WEB使用的推荐系统WRS (Web Recommendation System),在该系统中,提出了一种利用图形分割技术聚类用户访问模式的算法,并采用最长公共子序列算法对用户目前的行为进行识别.理论分析和实验结果表明,改进后的模型在推荐质量上有了较大提高.

  15. Mining Tasks from the Web Anchor Text Graph: MSR Notebook Paper for the TREC 2015 Tasks Track

    Science.gov (United States)

    2015-11-20

    investigated the effectiveness of mining session co-occurrence data. For a search engine log, session bound- aries can be defined in the typical way but to...matching seed candidates (link text from the web graph or queries over search logs) and expand to related candidate key phrases via this session as...Given a query, we find matching seed candidates (link text 1 from the web graph or queries over search logs) using a soft matching. These seed

  16. Penggunaan Web Crawler Untuk Menghimpun Tweets dengan Metode Pre-Processing Text Mining

    Directory of Open Access Journals (Sweden)

    Bayu Rima Aditya

    2015-11-01

    Full Text Available Saat ini jumlah data di media sosial sudah terbilang sangat besar, namun jumlah data tersebut masih belum banyak dimanfaatkan atau diolah untuk menjadi sesuatu yang bernilai guna, salah satunya adalah tweets pada media sosial twitter. Paper ini menguraikan hasil penggunaan engine web crawel menggunakan metode pre-processing text mining. Penggunaan engine web crawel itu sendiri bertujuan untuk menghimpun tweets melalui API twitter sebagai data teks tidak terstruktur yang kemudian direpresentasikan kembali kedalam bentuk web. Sedangkan penggunaan metode pre-processing bertujuan untuk menyaring tweets melalui tiga tahap, yaitu cleansing, case folding, dan parsing. Aplikasi yang dirancang pada penelitian ini menggunakan metode pengembangan perangkat lunak yaitu model waterfall dan diimplementasikan dengan bahasa pemrograman PHP. Sedangkan untuk pengujiannya menggunakan black box testing untuk memeriksa apakah hasil perancangan sudah dapat berjalan sesuai dengan harapan atau belum. Hasil dari penelitian ini adalah berupa aplikasi yang dapat mengubah tweets yang telah dihimpun menjadi data yang siap diolah lebih lanjut sesuai dengan kebutuhan user berdasarkan kata kunci dan tanggal pencarian. Hal ini dilakukan karena dari beberapa penelitian terkait terlihat bahwa data pada media sosial khususnya twitter saat ini menjadi tujuan perusahaan atau instansi untuk memahami opini masyarakat

  17. Mining the human phenome using semantic web technologies: a case study for Type 2 Diabetes.

    Science.gov (United States)

    Pathak, Jyotishman; Kiefer, Richard C; Bielinski, Suzette J; Chute, Christopher G

    2012-01-01

    The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.

  18. E-Journal Metrics for Collection Management: Exploring Disciplinary Usage Differences in Scopus and Web of Science

    Directory of Open Access Journals (Sweden)

    Katherine Chew

    2016-04-01

    Full Text Available Objective – The purpose was to determine whether a relationship exists between journal downloads and either faculty authoring venue or citations to these faculty, or whether a relationship exists between journal rankings and local authoring venues or citations. A related purpose was to determine if any such relationship varied between or within disciplines. A final purpose was to determine if specific tools for ranking journals or indexing authorship and citation were demonstrably better than alternatives. Methods – Multiple years of journal usage, ranking, and citation data for twelve disciplines were combined in Excel, and the strength of relationships were determined using rank correlation coefficients. Results – The results illustrated marked disciplinary variation as to the degree that faculty decisions to download a journal article can be used as a proxy to predict which journals they will publish in or which journals will cite faculty’s work. While journal access requests show moderate to strong relationships with the journals in which faculty publish, as well as journals whose articles cite local faculty, the data suggest that Scopus may be the better resource to find such information for these journals in the health sciences and Web of Science may be the better resource for all other disciplines analyzed. The same can be said for the ability of external ranking mechanisms to predict faculty publishing behaviours. Eigenfactor is more predictive for both authoring and citing-by-others across most of the representative disciplines in the social sciences as well as the physical and natural sciences. With the health sciences, no clear pattern emerges. Conclusion – Collecting and correlating authorship and citation data allows patterns of use to emerge, resulting in a more accurate picture of use activity than the commonly used cost-per-use method. To find the best information on authoring activity by local faculty for subscribed

  19. 网络信息挖掘系统评价初探%Preliminary Exploration of the Evaluation of Web Mining Systems

    Institute of Scientific and Technical Information of China (English)

    贾丰; 张燕

    2003-01-01

    The paper proposes an evaluation scheme of Web mining system capabilities based on the investigation of several existing survey projects of data mining systems. The scheme covers 4 aspects, i. e. commercial capability, algorithm capability, application capability and Web mining process capability. The preliminary evaluation scheme is then used to investigate 19 systems, which are either commercial products or research prototypes. Finally, the results of the survey are described.

  20. A Generic Framework for Extraction of Knowledge from Social Web Sources (Social Networking Websites) for an Online Recommendation System

    Science.gov (United States)

    Sathick, Javubar; Venkat, Jaya

    2015-01-01

    Mining social web data is a challenging task and finding user interest for personalized and non-personalized recommendation systems is another important task. Knowledge sharing among web users has become crucial in determining usage of web data and personalizing content in various social websites as per the user's wish. This paper aims to design a…

  1. The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.

    Science.gov (United States)

    Hao, Haijing; Zhang, Kunpeng

    2016-05-10

    Many Web-based health care platforms allow patients to evaluate physicians by posting open-end textual reviews based on their experiences. These reviews are helpful resources for other patients to choose high-quality doctors, especially in countries like China where no doctor referral systems exist. Analyzing such a large amount of user-generated content to understand the voice of health consumers has attracted much attention from health care providers and health care researchers. The aim of this paper is to automatically extract hidden topics from Web-based physician reviews using text-mining techniques to examine what Chinese patients have said about their doctors and whether these topics differ across various specialties. This knowledge will help health care consumers, providers, and researchers better understand this information. We conducted two-fold analyses on the data collected from the "Good Doctor Online" platform, the largest online health community in China. First, we explored all reviews from 2006-2014 using descriptive statistics. Second, we applied the well-known topic extraction algorithm Latent Dirichlet Allocation to more than 500,000 textual reviews from over 75,000 Chinese doctors across four major specialty areas to understand what Chinese health consumers said online about their doctor visits. On the "Good Doctor Online" platform, 112,873 out of 314,624 doctors had been reviewed at least once by April 11, 2014. Among the 772,979 textual reviews, we chose to focus on four major specialty areas that received the most reviews: Internal Medicine, Surgery, Obstetrics/Gynecology and Pediatrics, and Chinese Traditional Medicine. Among the doctors who received reviews from those four medical specialties, two-thirds of them received more than two reviews and in a few extreme cases, some doctors received more than 500 reviews. Across the four major areas, the most popular topics reviewers found were the experience of finding doctors, doctors' technical

  2. Verification of the fulfilment of the purposes of Basel II, Pillar 3 through application of the web log mining methods

    Directory of Open Access Journals (Sweden)

    M. Munk

    2012-01-01

    Full Text Available The objective of the paper is the verification of the fulfilment of the purposes of Basel II, Pillar 3 – market discipline during the recent financial crisis. The objective of the paper is to describe the current state of the working out of the project that is focused on the analysis of the market participants’ interest in mandatory disclosure of financial information by a commercial bank by means of advanced methods of web log mining. The output of the realized project will be the verification of the assumptions related to the purposes of Basel III by means of the web mining methods, the recommendations for possible reduction of mandatory disclosure of information under Basel II and III, the proposal of the methodology for data preparation for web log mining in this application domain and the generalised procedure for users’ behaviour modelling dependent on time. The schedule of the project has been divided into three phases. The paper deals with its first phase that is focusing on the data pre-processing, analysis and evaluation of the required information under Basel II, Pillar 3 since 2008 and its disclosure into the web site of a commercial bank. The authors introduce the methodologies for data preparation and known heuristic methods for path completion into web log files with respect to the particularity of investigated application domain. They propose scientific methods for modelling users’ behaviour of the webpages related to Pillar 3 with respect to time.

  3. Building and mining web-based questionnaires and surveys with SySQ.

    Science.gov (United States)

    Sarica, Alessia; Guzzi, Pietro Hiram; Cannataro, Mario

    2013-09-01

    A questionnaire is a method for collecting data that can come from many sources such as observations, telephone interviews or documentary sources. Whatever the source of data is, the questionnaire provides a framework of questions that facilitate researcher's work. A manual approach for collecting data using questionnaire presents some limitations and introduces several sources of errors. A second issue regards the statistical and data mining of data that often is conducted using different tools than the questionnaire system, which may introduce errors in the analysis pipeline. For instance, common methods applied to data set concern the normality test, the association and correlation discovery, linear regression, classification and clustering. Usually this analysis is performed using external tools, often not free, such as SPSS, SAS, STATA, Weka, or Clementine.We present a web-based software system, to automatize the analysis pipeline and to support researchers involved in the collection of questionnaire data, such as in epidemiology, aiming to reduce the errors listed above and including some basic functions to conduct statistical analysis on collected data. Our system allows researchers to create questionnaires, adding sections and structured questions. It provides a preview of the questionnaire, the exportation of saved data into statistical software compatible formats, or it permits to analyze them directly applying statistical methods and common data mining techniques from the main interface.

  4. Web search and data mining of natural products and their bioactivities in PubChem.

    Science.gov (United States)

    Ming, Hao; Tiejun, Cheng; Yanli, Wang; Stephen, Bryant H

    2013-10-01

    Natural products, as major resources for drug discovery historically, are gaining more attentions recently due to the advancement in genomic sequencing and other technologies, which makes them attractive and amenable to drug candidate screening. Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost. Lately, a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products. Thus, it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment. PubChem, as a public database, contains large amounts of natural products associated with bioactivity data. In this review, we introduce the information system provided at PubChem, and systematically describe the applications for a set of PubChem web services for rapid data retrieval, analysis, and downloading of natural products. We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.

  5. SQUAT: A web tool to mine human, murine and avian SAGE data

    Directory of Open Access Journals (Sweden)

    Besson Jérémy

    2008-09-01

    Full Text Available Abstract Background There is an increasing need in transcriptome research for gene expression data and pattern warehouses. It is of importance to integrate in these warehouses both raw transcriptomic data, as well as some properties encoded in these data, like local patterns. Description We have developed an application called SQUAT (SAGE Querying and Analysis Tools which is available at: http://bsmc.insa-lyon.fr/squat/. This database gives access to both raw SAGE data and patterns mined from these data, for three species (human, mouse and chicken. This database allows to make simple queries like "In which biological situations is my favorite gene expressed?" as well as much more complex queries like: ≪what are the genes that are frequently co-over-expressed with my gene of interest in given biological situations?≫. Connections with external web databases enrich biological interpretations, and enable sophisticated queries. To illustrate the power of SQUAT, we show and analyze the results of three different queries, one of which led to a biological hypothesis that was experimentally validated. Conclusion SQUAT is a user-friendly information retrieval platform, which aims at bringing some of the state-of-the-art mining tools to biologists.

  6. WebM as an alternative to H.264? : Investigation of the usage of open source software and open standards

    NARCIS (Netherlands)

    Staalduinen, M. van; Prins, M.J.

    2011-01-01

    WebM is a new multimedia format often postitioned as an open and free to use alternative to the industry standard H.264. H.264 is currently the most popular web-video format, used for broadcast video, VoD but also Catch-Up TV services. Comparisons between WebM's VP8 video format and the H.264 format

  7. Tools and Databases of the KOMICS Web Portal for Preprocessing, Mining, and Dissemination of Metabolomics Data

    Directory of Open Access Journals (Sweden)

    Nozomu Sakurai

    2014-01-01

    Full Text Available A metabolome—the collection of comprehensive quantitative data on metabolites in an organism—has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal, where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  8. Tools and databases of the KOMICS web portal for preprocessing, mining, and dissemination of metabolomics data.

    Science.gov (United States)

    Sakurai, Nozomu; Ara, Takeshi; Enomoto, Mitsuo; Motegi, Takeshi; Morishita, Yoshihiko; Kurabayashi, Atsushi; Iijima, Yoko; Ogata, Yoshiyuki; Nakajima, Daisuke; Suzuki, Hideyuki; Shibata, Daisuke

    2014-01-01

    A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.

  9. Mining severe drug-drug interaction adverse events using Semantic Web technologies: a case study.

    Science.gov (United States)

    Jiang, Guoqian; Liu, Hongfang; Solbrig, Harold R; Chute, Christopher G

    2015-01-01

    Drug-drug interactions (DDIs) are a major contributing factor for unexpected adverse drug events (ADEs). However, few of knowledge resources cover the severity information of ADEs that is critical for prioritizing the medical need. The objective of the study is to develop and evaluate a Semantic Web-based approach for mining severe DDI-induced ADEs. We utilized a normalized FDA Adverse Event Report System (AERS) dataset and performed a case study of three frequently prescribed cardiovascular drugs: Warfarin, Clopidogrel and Simvastatin. We extracted putative DDI-ADE pairs and their associated outcome codes. We developed a pipeline to filter the associations using ADE datasets from SIDER and PharmGKB. We also performed a signal enrichment using electronic medical records (EMR) data. We leveraged the Common Terminology Criteria for Adverse Event (CTCAE) grading system and classified the DDI-induced ADEs into the CTCAE in the Web Ontology Language (OWL). We identified 601 DDI-ADE pairs for the three drugs using the filtering pipeline, of which 61 pairs are in Grade 5, 56 pairs in Grade 4 and 484 pairs in Grade 3. Among 601 pairs, the signals of 59 DDI-ADE pairs were identified from the EMR data. The approach developed could be generalized to detect the signals of putative severe ADEs induced by DDIs in other drug domains and would be useful for supporting translational and pharmacovigilance study of severe ADEs.

  10. Internet accessibility and usage among urban adolescents in Southern California: implications for web-based health research.

    Science.gov (United States)

    Sun, Ping; Unger, Jennifer B; Palmer, Paula H; Gallaher, Peggy; Chou, Chih-Ping; Baezconde-Garbanati, Lourdes; Sussman, Steve; Johnson, C Anderson

    2005-10-01

    The World Wide Web (WWW) poses a distinct capability to offer interventions tailored to the individual's characteristics. To fine tune the tailoring process, studies are needed to explore how Internet accessibility and usage are related to demographic, psychosocial, behavioral, and other health related characteristics. This study was based on a cross-sectional survey conducted on 2373 7th grade students of various ethnic groups in Southern California. Measures of Internet use included Internet use at school or at home, Email use, chat-room use, and Internet favoring. Logistic regressions were conducted to assess the associations between Internet uses with selected demographic, psychosocial, behavioral variables and self-reported health statuses. The proportion of students who could access the Internet at school or home was 90% and 40%, separately. Nearly all (99%) of the respondents could access the Internet either at school or at home. Higher SES and Asian ethnicity were associated with higher internet use. Among those who could access the Internet and after adjusting for the selected demographic and psychosocial variables, depression was positively related with chat-room use and using the Internet longer than 1 hour per day at home, and hostility was positively related with Internet favoring (All ORs = 1.2 for +1 STD, p Internet use (ORs for +1 STD ranged from 1.2 to 2.0, all p Internet use. Substance use was positively related to email use, chat-room use, and at home Internet use (OR for "used" vs. "not used" ranged from 1.2 to 4.0, p Internet use at home but lower levels of Internet use at school. More physical activity was related to more email use (OR = 1.3 for +1 STD), chat room use (OR = 1.2 for +1 STD), and at school ever Internet use (OR = 1.2 for +1 STD, all p Internet use-related measures. In this ethnically diverse sample of Southern California 7(th) grade students, 99% could access the Internet at school and/or at home. This suggests that the Internet

  11. The utility of web mining for epidemiological research: studying the association between parity and cancer risk.

    Science.gov (United States)

    Tourassi, Georgia; Yoon, Hong-Jun; Xu, Songhua; Han, Xuesong

    2016-05-01

    The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored. Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies. Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis. This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Potential influence of Web 2.0 usage and security practices of online users on information management

    Directory of Open Access Journals (Sweden)

    R.J. Rudman

    2009-02-01

    Full Text Available The proliferation of Web 2.0 applications was the impetus for this survey-based research into practices that online users currently employ when using Web 2.0 sites. As part of the study, the popularity of Web 2.0 technologies and sites among online users at a university was investigated to determine the extent of the potential threat to corporate security, arising from Web 2.0 use and access. The results of this study indicate that the use of Web 2.0 sites is very popular among students, as a proxy for the potential future business users, and that users are not necessarily aware of the risks associated with these sites. The respondents indicated that they regularly visit Web 2.0 sites, and that they post personal information on these sites. This is of concern in protecting arguably the most valuable asset of a business.

  13. Integration and publication of heterogeneous text-mined relationships on the Semantic Web.

    Science.gov (United States)

    Coulet, Adrien; Garten, Yael; Dumontier, Michel; Altman, Russ B; Musen, Mark A; Shah, Nigam H

    2011-05-17

    Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering. We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network. The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.

  14. Mining the Social Web Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites

    CERN Document Server

    Russell, Matthew

    2011-01-01

    Want to tap the tremendous amount of valuable social data in Facebook, Twitter, LinkedIn, and Google+? This refreshed edition helps you discover who's making connections with social media, what they're talking about, and where they're located. You'll learn how to combine social web data, analysis techniques, and visualization to find what you've been looking for in the social haystack-as well as useful information you didn't know existed. Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started

  15. HC StratoMineR: A Web-Based Tool for the Rapid Analysis of High-Content Datasets.

    Science.gov (United States)

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-10-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that these datasets are frequently underutilized. Here, we present HC StratoMineR, a web-based tool for high-content data analysis. It is a decision-supportive platform that guides even non-expert users through a high-content data analysis workflow. HC StratoMineR is built by using My Structured Query Language for storage and querying, PHP: Hypertext Preprocessor as the main programming language, and jQuery for additional user interface functionality. R is used for statistical calculations, logic and data visualizations. Furthermore, C++ and graphical processor unit power is diffusely embedded in R by using the rcpp and rpud libraries for operations that are computationally highly intensive. We show that we can use HC StratoMineR for the analysis of multivariate data from a high-content siRNA knock-down screen and a small-molecule screen. It can be used to rapidly filter out undesirable data; to select relevant data; and to perform quality control, data reduction, data exploration, morphological hit picking, and data clustering. Our results demonstrate that HC StratoMineR can be used to functionally categorize HCS hits and, thus, provide valuable information for hit prioritization.

  16. Nuclear expert web mining system: monitoring and analysis of nuclear acceptance by information retrieval and opinion extraction on the Internet

    Energy Technology Data Exchange (ETDEWEB)

    Reis, Thiago; Barroso, Antonio C.O.; Imakuma, Kengo, E-mail: thiagoreis@usp.b, E-mail: barroso@ipen.b, E-mail: kimakuma@ipen.b [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)

    2011-07-01

    This paper presents a research initiative that aims to collect nuclear related information and to analyze opinionated texts by mining the hypertextual data environment and social networks web sites on the Internet. Different from previous approaches that employed traditional statistical techniques, it is being proposed a novel Web Mining approach, built using the concept of Expert Systems, for massive and autonomous data collection and analysis. The initial step has been accomplished, resulting in a framework design that is able to gradually encompass a set of evolving techniques, methods, and theories in such a way that this work will build a platform upon which new researches can be performed more easily by just substituting modules or plugging in new ones. Upon completion it is expected that this research will contribute to the understanding of the population views on nuclear technology and its acceptance. (author)

  17. Recovering the Past. Eastern European Web Mining Platforms for Reconstructing Political Attitudes

    Directory of Open Access Journals (Sweden)

    Camelia Florela Voinea

    2015-01-01

    Full Text Available During the past half century, the political attitude of the Eastern European people toward the state, government and society changed dramatically. So did their value systems. Inglehart's materialist vs. post-materialist comparative analysis gives a measure of this value change, but not enough as to fully characterize the phenomena underlining the differences in political culture before and after the Fall of Berlin Wall. Little has left from the communist regimes to prove how this change actually occurred and where we are as compared to the stable democratic regimes. With rare exceptions, no public survey has been developed in the Eastern European countries between 1950-1990 able to mirror people's true beliefs and values. In order to understand the current value systems and political attitudes of the people in the Eastern Europe, we have to recover the past. One way to do that is to identify key concepts in the texts, discourses, audio and video recordings of the past times. The present paper provides the rationale of this approach and describes a system which works on dynamically collecting content-based items from library and web references and resources. The system currently works on concepts described by single words or compound expressions, and could be extended so as to work on multimedia items, like words, images, and sounds (voices, music, audio signals, etc.. Our approach aims at constructing a dynamic system and an open access repository of content-based collections of the past and offers a research instrument to the students of political attitudes toward democracy and freedom of the people in Eastern Europe. We approach the problem of recovering the historical process of political change in the Eastern European societies known as the Fall of Berlin Wall in terms of political attitude change modeling and simulation. Modeling makes intensive use of web and data mining technologies for identifying political attitude structural

  18. Assessing student usage, perception, and the utility of a Web-based simulation in a third-year medical school clerkship.

    Science.gov (United States)

    Wise, Eric M; McIvor, William R; Mangione, Michael P

    2016-09-01

    The goals of this study were to assess students' usage data of Web-based simulation (WBS), to determine if it can fill gaps in clinical experience-based medical education, and to determine students' perceived value of this kind of simulation during a clinical clerkship. Observational/prospective cohort. Medical school affiliated with a large academic hospital. A total od 138 medical students. Web-based simulation. Medical students in an anesthesiology clerkship were assigned a WBS focusing on the clinical use of pulmonary artery catheters (PACs). Usage data, including day of week and time of day that the simulation was used and total usage time, were collected for 99 students. Eighty voluntary survey responses, which gauged student perception of the simulation and clinical exposure to PACs, were also collected. Seventy-two percent of attempts were made during nonclinical hours of 5 pm to 7 am. Seventy-seven percent of students spent less than 30 minutes in total using the simulation. Students preferred the simulation (rated 4.1/5) over textbook (3.59) learning to a statistically significant degree (P simulation. Sixty-seven percent of students had never encountered a patient with a PAC before performing the simulation, and 41% did not discuss this learning objective during their clerkship. Students' self-rated understanding of PACs significantly increased from a presimulation score of 1.8 of 5, to 2.56 (mean difference, 0.760; P simulation. WBS in medical school clerkships is accepted by students and can fill gaps in clinical medical school education, without negatively affecting students' workloads or clerkship experiences. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Science.gov (United States)

    Jiang, Y.

    2015-12-01

    Oceanographic resource discovery is a critical step for developing ocean science applications. With the increasing number of resources available online, many Spatial Data Infrastructure (SDI) components (e.g. catalogues and portals) have been developed to help manage and discover oceanographic resources. However, efficient and accurate resource discovery is still a big challenge because of the lack of data relevancy information. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, usage metrics, and user feedback. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the propose engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.

  20. User Behavior Analysis from Web Log using Log Analyzer Tool

    Directory of Open Access Journals (Sweden)

    Brijesh Bakariya

    2013-11-01

    Full Text Available Now a day, internet plays a role of huge database in which many websites, information and search engines are available. But due to unstructured and semi-structured data in webpage, it has become a challenging task to extract relevant information. Its main reason is that traditional knowledge based technique are not correct to efficiently utilization the knowledge, because it consist of many discover pattern, contains a lots of noise and uncertainty. In this paper, analyzing of web usage mining has been made with the help if web log data for which web log analyzer tool, “Deep Log Analyzer” to find out abstract information from particular server and also tried to find out the user behavior and also developed an ontology which consist the relation among efficient web apart of web usage mining.

  1. 基于Web语义挖掘的产品功能使用度分析%Web semantic mining based product feature usability analysis

    Institute of Scientific and Technical Information of China (English)

    刘攀; 王丽亚

    2011-01-01

    Sample size and investigating target becomes to be two bottlenecks of questionnaire method in product feature usa-bility analysis. To well overcome these problems, this paper proposed a novel product feature usability analysis approach based on Web semantic mining to deal with the feature usability issue. Firstly, built related word list so as to develop a product usage information system. Then formulated feature usability model to analyze product features. At last, provided a cell phone case study to testify the feature usability analysis approach.%为了克服传统问卷调查方法研究产品功能使用度时受限于样本大小和目标针对性不强等缺陷,提出了基于Web语义挖掘的产品功能使用度分析方法.运用基于人工修正的知网方法构建了关联词表,然后开发了产品使用信息系统,构建了产品功能定量化研究模型,对产品功能使用度进行分析.通过某款手机具体对该系统性的方法进行了验证,为产品功能决策提供依据.

  2. Secrets from the Deep: Mining Government Information from the Invisible Web (CD-ROM)

    Science.gov (United States)

    MB. SYSTEMS DETAIL NOTE: IBM-clone PC-compatible. ABSTRACT: Presentation and tutorial on search techniques for the ’ invisible web .’ Includes...explanation of the Invisible Web and its content; relevance of searching the invisible web ; and when to use the invisible web .

  3. Application of Web Services in Coal Mine Enterprise System Integration%煤矿企业系统集成中Web Services的应用

    Institute of Scientific and Technical Information of China (English)

    李文光

    2013-01-01

    Taking the coal enterprise system integration services as the study object, the enterprise original system data independent problem was elaborated, three kinds of solution to the system integration programmes were put forward, which were intermediate database, open database in depth interview methods and Web Services interface. Through three schemes comparison, the Web Services effectiveness was verified, and safety was verified. The research provides a reliable theoretical basis for coal mine enterprise informationization development.%以煤矿企业系统集成服务为研究对象,阐述了企业原有系统的数据独立问题,提出了3种解决系统集成的方案,分别为中间数据库方式、开放数据库深度访问方式以及Web Services接口方式。通过3个方案的对比,验证了Web Services方式的有效性和安全性。此研究为煤矿企业信息化发展提供了可靠的理论依据。

  4. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  5. Working with Data: Discovering Knowledge through Mining and Analysis; Systematic Knowledge Management and Knowledge Discovery; Text Mining; Methodological Approach in Discovering User Search Patterns through Web Log Analysis; Knowledge Discovery in Databases Using Formal Concept Analysis; Knowledge Discovery with a Little Perspective.

    Science.gov (United States)

    Qin, Jian; Jurisica, Igor; Liddy, Elizabeth D.; Jansen, Bernard J; Spink, Amanda; Priss, Uta; Norton, Melanie J.

    2000-01-01

    These six articles discuss knowledge discovery in databases (KDD). Topics include data mining; knowledge management systems; applications of knowledge discovery; text and Web mining; text mining and information retrieval; user search patterns through Web log analysis; concept analysis; data collection; and data structure inconsistency. (LRW)

  6. Research on Fuzzy Clustering Validity in Web Text Mining%Web文本挖掘中模糊聚类的有效性评价研究

    Institute of Scientific and Technical Information of China (English)

    罗琪

    2012-01-01

    本文研究了基于模糊聚类的Web文本挖掘和模糊聚类有效性评价函数,并将其应用于Web文本挖掘中模糊聚类有效性评价.仿真实验表明该方法有一定的准确性和可行性.%This paper studies web documents mining based on fuzzy clustering and validity evaluation function, and puts forward to applying validity evaluation function into evaluation of web text mining. The experiments show that FKCM can effectively improve the precision of web text clustering; the method is feasible in web documents mining. The result of emulation examinations indicates that the method has certain feasibility and accuracy.

  7. Les Chansons de la Francophonie Web Site and Its Two Web-Usage-Tracking Systems in an Advanced Listening Comprehension Course

    Science.gov (United States)

    Weinberg, Alysse

    2005-01-01

    The "Les Chansons de la francophonie" web site is based on French songs and was developed using HTML and JavaScript for the advanced French Comprehension Course at the Second Language Institute of the University of Ottawa. These interactive listening activities include true-false and multiple-choice questions, fill in the blanks,…

  8. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches.

    Science.gov (United States)

    Svenstrup, Dan; Jørgensen, Henrik L; Winther, Ole

    2015-01-01

    Physicians and the general public are increasingly using web-based tools to find answers to medical questions. The field of rare diseases is especially challenging and important as shown by the long delay and many mistakes associated with diagnoses. In this paper we review recent initiatives on the use of web search, social media and data mining in data repositories for medical diagnosis. We compare the retrieval accuracy on 56 rare disease cases with known diagnosis for the web search tools google.com, pubmed.gov, omim.org and our own search tool findzebra.com. We give a detailed description of IBM's Watson system and make a rough comparison between findzebra.com and Watson on subsets of the Doctor's dilemma dataset. The recall@10 and recall@20 (fraction of cases where the correct result appears in top 10 and top 20) for the 56 cases are found to be be 29%, 16%, 27% and 59% and 32%, 18%, 34% and 64%, respectively. Thus, FindZebra has a significantly (p data have opened new possibilities for aiding the diagnostic process. Specialized search engines, data mining tools and social media are some of the areas that hold promise.

  9. Web Log Clustering Approaches – A Survey

    Directory of Open Access Journals (Sweden)

    G. Sudhamathy,

    2011-07-01

    Full Text Available As more organization rely on the Internet and the World Wide Web to conduct business, the proposed strategies and techniques for market analysis need to be revisited in this context. We therefore present a survey of the most recent work in the field of Web usage mining, focusing on three different approaches towards web logs clustering. Clustering analysis is a widely used data mining algorithm which is a process of partitioning a set of data objects into a number of object clusters, where each data object shares the high similarity with the other objects within the same cluster but is quite dissimilar to objects in other clusters. In this work we discuss three different approaches on web logs clustering, analyze their benefits and drawbacks. We finally conclude on the most efficient algorithm based on the results of experiments conducted with various web log files.

  10. Web Usage Mining Research Based on XML%基于XML的Web日志挖掘研究

    Institute of Scientific and Technical Information of China (English)

    潘有能

    2006-01-01

    设计一个基于XML的Web日志挖掘体系结构,简要介绍XGMML和LOGML,并在此基础上讨论LOGML文档的生成方法及利用Apriori算法对日志文档进行频繁集、频繁序列和频繁子图挖掘.

  11. The Research of the Method of the Data Mining Based on Web%基于Web的数据挖掘方法研究

    Institute of Scientific and Technical Information of China (English)

    孙杰

    2012-01-01

    Web挖掘是传统数据挖掘技术在Web环境下的应用,试图从大量的Web文档集合和用户浏览Web的数据信息中发现蕴涵的、未知的、有潜在应用价值的、非平凡的模式。本文介绍了Web挖掘相关原理和目前的研究现状以及正在Web流行的挖掘工具——爬虫工具及搜索引擎技术等。%Web data mining, attempts to find the implicative and non-trivial patterns with potential value from the Web files when the users browse lhe Web, is traditional data mining technology applied in the Web. This paper introduces the Web mining related principle and the research of present situation and popular mining tools in the Web -crawler tool and search engine technology, etc.

  12. Web Data Mining and Social Media Analysis for better Communication in Food Safety Crises

    Directory of Open Access Journals (Sweden)

    Christian H. Meyer

    2015-07-01

    Full Text Available Although much effort is made to prevent risks arising from food, food-borne diseases are an ever-present threat to the consumers’ health. The consumption of fresh food that is contaminated with pathogens like fungi, viruses or bacteria can cause food poisoning that leads to severe health damages or even death. The outbreak of Shiga Toxin-producing enterohemorrhagic E. coli (EHEC in Germany and neighbouring countries in 2011 has shown this dramatically. Nearly 4.000 people were reported of being affected and more than 50 people died during the so called EHEC-crisis. As a result the consumers’ trust in the safety of fruits and vegetables decreased sharply.In situations like that quick decisions and reaction from public authorities as well as from privately owned companies are important: Food crisis managers have to identify and track back contaminated products and they have to withdraw them from the market. At the same time they have to inform the stakeholders about potential threats and recent developments. This is a particularly challenging task, because when an outbreak is just detected, information about the actual scope is sparse and the demand for information is high. Thus, ineffective communication among crisis managers and towards the public can result in inefficient crisis management, health damages and a major loss of trust in the food system. This is why crisis communication is a crucial part of successful crisis management, whereas the quality of crisis communication largely depends on the availability of and the access to relevant information.In order to improve the availability of information, we have explored how information from public accessible internet sources like Twitter or Wikipedia can be harnessed for food crisis communication. In this paper we are going to report on some initial insight from a web mining and social media analysis approach to monitor health and food related issues that can develop into a potential

  13. Dark Web

    CERN Document Server

    Chen, Hsinchun

    2012-01-01

    The University of Arizona Artificial Intelligence Lab (AI Lab) Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect "ALL" web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc. We have developed various multilingual data mining, text mining, and web mining techniques to perform link analysis, content analysis, web metrics (technical

  14. "Our teacher speaks English at all times!" The mining of profesors usage of language at forin language lesson"

    Directory of Open Access Journals (Sweden)

    Urška Sešek

    2009-12-01

    Full Text Available Different approaches to foreign language teaching can entail very different approaches to the use of the target language in the classroom. The currently prevailing opinion is that the teacher should not primarily use the learners' mother tongue but the target language, as far as that is possible and meaningful. This is important even though today's learners of mainstream-taught foreign languages in Slovenia are much more exposed to their target language outside of school than they were even 10 years ago. The teacher's use of the target language namely represents not only a source of input and a model of its active usage but is also a means of establishing authority and a tool for execution of classroom activities. In order to successfully carry out all of her/his increasingly demanding professional tasks, the teacher should maintain and develop their target language competences in terms of accuracy, appropriateness and modification strategies to adapt to learner needs. It is also very useful to look at the teacher's target language use from a functional perspective to become aware of how different types of utterances / speech acts / language forms can contribute to achieving different educational goals.

  15. Construction of web-based nutrition education contents and searching engine for usage of healthy menu of children.

    Science.gov (United States)

    Hong, Soon-Myung; Lee, Tae-Kyong; Chung, Hea-Jung; Park, Hye-Kyung; Lee, Eun-Ju; Nam, Hye-Seon; Jung, Soon-Im; Cho, Jee-Ye; Lee, Jin-Hee; Kim, Gon; Kim, Min-Chan

    2008-01-01

    A diet habit, which is developed in childhood, lasts for a life time. In this sense, nutrition education and early exposure to healthy menus in childhood is important. Children these days have easy access to the internet. Thus, a web-based nutrition education program for children is an effective tool for nutrition education of children. This site provides the material of the nutrition education for children with characters which are personified nutrients. The 151 menus are stored in the site together with video script of the cooking process. The menus are classified by the criteria based on age, menu type and the ethnic origin of the menu. The site provides a search function. There are three kinds of search conditions which are key words, menu type and "between" expression of nutrients such as calorie and other nutrients. The site is developed with the operating system Windows 2003 Server, the web server ZEUS 5, development language JSP, and database management system Oracle 10 g.

  16. Usage of data-encoded web maps with client side color rendering for combined data access, visualization, and modeling purposes

    Science.gov (United States)

    Pliutau, Denis; Prasad, Narasimha S.

    2013-05-01

    Current approaches to satellite observation data storage and distribution implement separate visualization and data access methodologies which often leads to the need in time consuming data ordering and coding for applications requiring both visual representation as well as data handling and modeling capabilities. We describe an approach we implemented for a data-encoded web map service based on storing numerical data within server map tiles and subsequent client side data manipulation and map color rendering. The approach relies on storing data using the lossless compression Portable Network Graphics (PNG) image data format which is natively supported by web-browsers allowing on-the-fly browser rendering and modification of the map tiles. The method is easy to implement using existing software libraries and has the advantage of easy client side map color modifications, as well as spatial subsetting with physical parameter range filtering. This method is demonstrated for the ASTER-GDEM elevation model and selected MODIS data products and represents an alternative to the currently used storage and data access methods. One additional benefit includes providing multiple levels of averaging due to the need in generating map tiles at varying resolutions for various map magnification levels. We suggest that such merged data and mapping approach may be a viable alternative to existing static storage and data access methods for a wide array of combined simulation, data access and visualization purposes.

  17. The Research and Application of Web Data Mining%基于Web数据挖掘的研究与应用

    Institute of Scientific and Technical Information of China (English)

    苏燕; 梁武

    2014-01-01

    随着Internet的迅速发展及Web的全球普及,数据挖掘技术已经在各个领域、各个行业展现了它的巨大作用。通过分析Web及数据挖掘的特点,列举出数据挖掘在Web基础上的具体应用。%With the rapid development of Internet and the global popularity of the Web ,data mining technology has been in various fields ,and showed its important role in various industries ,through the analysis of the characteristics of Web and data mining ,this paper lists the Web data mining on the basis of the specific application .

  18. COLLABORATIVE WEB RECOMMENDATION SYSTEMS BASED ON AN EFFECTIVE FUZZY ASSOCIATION RULE MINING ALGORITHM (FARM)

    OpenAIRE

    Dr. P. THAMBIDURAI; A.KUMAR,

    2010-01-01

    With increasing popularity of the web-based systems that are applied in many different areas, they tend to deliver customized informationfor their users by means of utilization of recommendation methods. This recommendation system is mainly classified into two groups:Content-based recommendation and collaborative recommendation system. Content based recommendation tries to recommend web sites similar to those web sites the user has liked, whereas collaborative ecommendation tries to find som...

  19. RiceGeneThresher: a web-based application for mining genes underlying QTL in rice genome.

    Science.gov (United States)

    Thongjuea, Supat; Ruanjaichon, Vinitchan; Bruskiewich, Richard; Vanavichit, Apichart

    2009-01-01

    RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein-protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th.

  20. A citation analysis of the research reports of the Central Mining Institute. Mining and Environment using the Web of Science, Scopus, BazTech, and Google Scholar: A case study

    OpenAIRE

    2015-01-01

    This paper presents the analysis of a Polish mining sciences journal (Prace Naukowe GIG. Górnictwo i Środowisko; title in English: Research Reports of the Central Mining Institute. Mining and Environment; acronym in English [RRCMIME]). The analysis is based on data from the following sources: the Web of Science (WoS), Scopus, BazTech (a bibliographic database containing citations from Polish Technical Journals), and Google Scholar (GS). The data from the WoS and Scopus were collected manually...

  1. Research on Web Text Mining Based on Ontology%基于领域本体实现Web文本挖掘研究

    Institute of Scientific and Technical Information of China (English)

    阮光册

    2011-01-01

    为弥补改进传统Web文本挖掘方法缺乏对文本语义理解的不足,采用本体与Web文本挖掘相结合的方法,探讨基于领域本体的Web文本挖掘方法。首先创建Web文本的本体结构,然后引入领域本体“概念-概念”相似度矩阵,并就概念间关系识别进行描述,最后给出Web文本挖掘的实现方法,发现Web文本信息的内涵。实验中以网络媒体报道为例,通过文本挖掘得出相关结论。%The paper improved the traditional web text mining technology which can not understand the text semantics. The author discusses the web text mining methods based on the ontology, and sets up the web ontology structure at first, then introduces the "concept-concept" similarity matrix, and describs the relations among the concepts; puts forward the web text mining method at last. Based on text mining, the paper can find the potential information from the web pages. Finally, the author did a case study and drew some conclusion.

  2. Web data mining在远程教育中的应用%Application of Web Data Mining in Distance Education

    Institute of Scientific and Technical Information of China (English)

    白伟

    2009-01-01

    采用Web data mining对远程教育进行分析,根据受教育对象存在的个体差异,提出个性化远程学习系统的框架结构思想和个性化服务的理念,对相关信息进行数据挖掘并建立起一个集智能化、个性化为一体的远程教育系统,从而更好地改善远程教育服务的现状.

  3. Data Mining of Web-Based Documents on Social Networking Sites That Included Suicide-Related Words Among Korean Adolescents.

    Science.gov (United States)

    Song, Juyoung; Song, Tae Min; Seo, Dong-Chul; Jin, Jae Hyun

    2016-12-01

    To investigate online search activity of suicide-related words in South Korean adolescents through data mining of social media Web sites as the suicide rate in South Korea is one of the highest in the world. Out of more than 2.35 billion posts for 2 years from January 1, 2011 to December 31, 2012 on 163 social media Web sites in South Korea, 99,693 suicide-related documents were retrieved by Crawler and analyzed using text mining and opinion mining. These data were further combined with monthly employment rate, monthly rental prices index, monthly youth suicide rate, and monthly number of reported bully victims to fit multilevel models as well as structural equation models. The link from grade pressure to suicide risk showed the largest standardized path coefficient (beta = .357, p pressure, low body image, victims of bullying, and concerns about disease. The largest total effect was observed in the grade pressure to depression to suicide risk. The multilevel models indicate about 27% of the variance in the daily suicide-related word search activity is explained by month-to-month variations. A lower employment rate, a higher rental prices index, and more bullying were associated with an increased suicide-related word search activity. Academic pressure appears to be the biggest contributor to Korean adolescents' suicide risk. Real-time suicide-related word search activity monitoring and response system needs to be developed. Copyright © 2016 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  4. Visualization of Mined Pattern and Its Human Aspects

    CERN Document Server

    Jain, Ratnesh Kumar; Kasana, Dr R S

    2009-01-01

    Researchers got success in mining the Web usage data effectively and efficiently. But representation of the mined patterns is often not in a form suitable for direct human consumption. Hence mechanisms and tools that can represent mined patterns in easily understandable format are utilized. Different techniques are used for pattern analysis, one of them is visualization. Visualization can provide valuable assistance for data analysis and decision making tasks. In the data visualization process, technical representations of web pages are replaced by user attractive text interpretations. Experiments with the real world problems showed that the visualization can significantly increase the quality and usefulness of web log mining results. However, how decision makers perceive and interact with a visual representation can strongly influence their understanding of the data as well as the usefulness of the visual presentation. Human factors therefore contribute significantly to the visualization process and should p...

  5. Mining Hidden Gems Beneath the Surface: A Look At the Invisible Web.

    Science.gov (United States)

    Carlson, Randal D.; Repman, Judi

    2002-01-01

    Describes resources for researchers called the Invisible Web that are hidden from the usual search engines and other tools and contrasts them with those resources available on the surface Web. Identifies specialized search tools, databases, and strategies that can be used to locate credible in-depth information. (Author/LRW)

  6. Unsupervised Learning of mDTD Extraction Patterns for Web Text Mining.

    Science.gov (United States)

    Kim, Dongseok; Jung, Hanmin; Lee, Gary Geunbae

    2003-01-01

    Presents a new extraction pattern, modified Document Type Definition (mDTD), which relies on analytical interpretation to identify extraction target from the contents of Web documents. Experiments with 330 Korean and 220 English Web documents on audio and video shopping sites yielded an average extraction precision of 91.3% for Korean and 81.9%…

  7. Using a web-based orthopaedic clinic in the curricular teaching of a German university hospital: analysis of learning effect, student usage and reception.

    Science.gov (United States)

    Wünschel, Markus; Leichtle, Ulf; Wülker, Nikolaus; Kluba, Torsten

    2010-10-01

    Modern teaching concepts for undergraduate medical students in Germany include problem based learning as a major component of the new licensing regulations for physicians. Here we describe the usage of a web-based virtual outpatient clinic in the teaching curriculum of undergraduate medical students, its effect on learning success, and student reception. Fifth year medial students were requested to examine 7 virtual orthopaedic patients which had been created by the authors using the Inmedea-Simulator. They also had to take a multiple-choice examination on two different occasions and their utilisation of the simulator was analysed subjectively and objectively. One hundred and sixty students took part in the study. The average age was 24.9 years, 60% were female. Most of the participants studied on their own using their private computer with a fast internet-connection at home. The average usage time was 263 min, most of the students worked with the system in the afternoon, although a considerable number used it late in the night. Regarding learning success, we found that the examination results were significantly better after using the system (7.66 versus 8.37, pgraphic design and the expert comments available, as well as the good applicability to real cases. Eighty-seven percent of the students graded the virtual orthopaedic clinic as appropriate to teach orthopaedic content. Using the Inmedea-Simulator is an effective method to enhance students' learning efficacy. The way the system was used by the students emphasises the advantages of the internet-like free time management and the implementation of multimedia-based content. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

  8. Usage Bibliometrics

    Science.gov (United States)

    Kurtz, Michael J.; Bollen, Johan

    2010-01-01

    Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process.

  9. Entomopathogenic nematode food webs in an ancient, mining pollution gradient in Spain.

    Science.gov (United States)

    Campos-Herrera, Raquel; Rodríguez Martín, José Antonio; Escuer, Miguel; García-González, María Teresa; Duncan, Larry W; Gutiérrez, Carmen

    2016-12-01

    Mining activities pollute the environment with by-products that cause unpredictable impacts in surrounding areas. Cartagena-La Unión mine (Southeastern-Spain) was active for >2500years. Despite its closure in 1991, high concentrations of metals and waste residues remain in this area. A previous study using nematodes suggested that high lead content diminished soil biodiversity. However, the effects of mine pollution on specific ecosystem services remain unknown. Entomopathogenic nematodes (EPN) play a major role in the biocontrol of insect pests. Because EPNs are widespread throughout the world, we speculated that EPNs would be present in the mined areas, but at increased incidence with distance from the pollution focus. We predicted that the natural enemies of nematodes would follow a similar spatial pattern. We used qPCR techniques to measure abundance of five EPN species, five nematophagous fungi species, two bacterial ectoparasites of EPNs and one group of free-living nematodes that compete for the insect-cadaver. The study comprised 193 soil samples taken from mining sites, natural areas and agricultural fields. The highest concentrations of iron and zinc were detected in the mined area as was previously described for lead, cadmium and nickel. Molecular tools detected very low numbers of EPNs in samples found to be negative by insect-baiting, demonstrating the importance of the approach. EPNs were detected at low numbers in 13% of the localities, without relationship to heavy-metal concentrations. Only Acrobeloides-group nematodes were inversely related to the pollution gradient. Factors associated with agricultural areas explained 98.35% of the biotic variability, including EPN association with agricultural areas. Our study suggests that EPNs have adapted to polluted habitats that might support arthropod hosts. By contrast, the relationship between abundance of Acrobeloides-group and heavy-metal levels, revealed these taxa as especially well suited bio

  10. Client-side Web Mining for Community Formation in Peer-to-Peer Environments

    Data.gov (United States)

    National Aeronautics and Space Administration — In this paper we present a framework for forming interests-based Peer-to-Peer communities using client-side web browsing history. At the heart of this framework is...

  11. Mobile Web Browsing Based On Content Preserving With Reduced Cost

    Directory of Open Access Journals (Sweden)

    Dr.N.Saravanaselvam

    2015-01-01

    Full Text Available Internet has played a drastic change in today’s life. Especially, web browsing has become more exclusive in compact devices. This tempts the people to migrate their innovations & skills into an unimaginable world. With these things in mind, it is necessary for us to concentrate more on the techniques that how the web data’s are accessed and accounted. Developed countries use a widely popular technique called Flat- rate pricing, which is solely independent on data usage. But whereas, developing countries are still behind the concept of “pay as you use”, which leads to high usage bills.With an effort to resolve the problem of high usage bills, we propose a cost effective technique, which reduces the data consumption in web mobile browsing. It reduces the usage bills in the mechanism of usage-based pricing. The key idea of our approach is to leverage the data plan of the user to compute a cost quota for each web request and a network middle-box to automatically adapt any web page to the cost quota. Here we use a simple but effective content adaption technique that highly decides which image or data best fits the mobile display with low cost and high quality resolution. It also emphasis on the trendy technique,” The Data Mining “which mines the requested & required data. The mined data’s are filtered based on the content adaption technique and fit into the display effectively. Interesting and noticeable feature in this concept is that only important web contents requested by the user are exhibited. A feedback process involves in this concept to retrieve the required data alone and also to improve the best fit resolution. With this proposed system web mobile browsing becomes cheaper & contributes an enormous logic for the future project in the field of Mobile browsing.

  12. 基于Web Services的web挖掘集成研究%Integration Techniques of Web Mining Based on Web Services

    Institute of Scientific and Technical Information of China (English)

    钟倩林; 王焕民; 杜亚江

    2009-01-01

    通过对Web内容挖掘、Web结构挖掘、Web使用记录挖掘的研究和分析,探讨了利用Web Services技术实现三者之间的集成,并在此基础上提出了基于Web Services的Web挖掘集成的实现方案.方案的实施,可以更容易地获得各种Web信息进而对其进行分析,从而发现潜在用户,改进站点设计,方便客户浏览和交易.

  13. Network mapping and usage determination

    CSIR Research Space (South Africa)

    Senekal, FP

    2007-07-01

    Full Text Available A large computer network such as the Internet contains millions of computers, services and users, interconnected in a complicated and ever changing web. This article provides an introduction to network mapping and usage determination – the study...

  14. Research on Data Preprocessing Technology in Web Log Mining%Web日志挖掘中的数据预处理技术研究

    Institute of Scientific and Technical Information of China (English)

    杨玉梅

    2014-01-01

    Preprocessing is the key of Web log mining, the result of preprocessing has a great influence on rules and pattern produced by mining algorithm, which is key ensuring the quality of Web mining. This paper presents DUI technology, enhance the preprocessing technology. It is proved by experiments, advanced data preprocessing technology may enhance the result quality of data preprocessing .%预处理是Web日志挖掘的重点,预处理的结果对挖掘算法产生的规则与模式有很大的影响,是保证 Web日志挖掘质量的关键。本文提出了DUI技术,增强了预处理技术。并通过实验证明,先进的数据预处理技术可以提高数据预处理的结果质量。

  15. 基于Web的文本挖掘系统的研究与实现%The Research and Development of Text Mining System Based on Web

    Institute of Scientific and Technical Information of China (English)

    唐菁; 沈记全; 杨炳儒

    2003-01-01

    With the development of network technology, the spread of information on Internet becomes more andmore quick. There are many types of complicated data in the information ocean. How to acquire useful knowledgequickly from the information ocean is the very difficult. The Text Mining based on Web is the new research fieldwhich can solve the problem effectively. In this paper, we present a structure model of Text Mining and research thecore arithmetic - Classification arithmetic. We have developed the Text Mining system based on Web and appliedit in the modern long-distance education. This system can automatically classify the text information of education fieldwhich is collected from education site on Internet and help people to browser the important information quickly andacquire knowledge.

  16. Web日志挖掘数据预处理研究%Data Preprocessing of Web Log Mining

    Institute of Scientific and Technical Information of China (English)

    何波; 涂飞; 程勇军

    2011-01-01

    数据预处理在Web日志挖掘过程中起着至关重要的作用.论文分析了Web日志挖掘数据预处理的主要步骤,设计了用户识别、访问操作识别和路径完善三个步骤的关键算法.实验结果表明,设计的关键算法是有效的.%Data preprocessing plays an essential role in the process of Web log mining. This paper analyses the steps of data preprocessing, and designs the key algorithms of user identification, session identification and path completion. It is proved that the key algorithms are effective.

  17. Application of Web data mining technology in E-commerce%Web数据挖掘技术在电子商务中的应用

    Institute of Scientific and Technical Information of China (English)

    延丽平

    2016-01-01

    电子商务的迅猛发展产生了海量的Web数据,从电子商务的大数据中发现潜在的、有用的知识和信息,是电子商务健康发展的需要。在电子商务中应用Web数据挖掘技术,可实现从电子商务的Web文档和Web活动中抽取出隐藏的有用模式。本文通过介绍Web数据挖掘技术,分析其在电子商务中的挖掘流程,对其在电子商务中的具体应用进行了探讨。%The rapid development of E-commerce has produced massive Web data, to discover the potential and useful knowledge and information from the big data of E-commerce is the need of the healthy development of electronic commerce. Application of Web data mining technology in e-commerce can extract hidden useful model from Web documents and Web activities of E-commerce. In this paper, the Web data mining technology is introduced, its mining process in E-commerce is analyzed, and its application in E-commerce is discussed.

  18. 基于文本挖掘的网络新闻报道差异分析%Analysis on Web Media Report Differences Based on Text Mining

    Institute of Scientific and Technical Information of China (English)

    阮光册

    2012-01-01

    It is a new research on how to find potential but valued information in the web media reports based on text mining technology. This paper discusses the text mining methods of web media reports. In the case of web media reports on Shanghai Expo, the author has done some empirical study to analyze the differences among different web media. The paper selected the web media reports on Expo" from Hong Kong, Tai Wan, overseas newspapers (Chinese version) and Shanghai, analyzed the differences among these different regions base on text mining and attribution extraction and drew some conclusions.%运用文本挖掘技术发现网络新闻报道中潜在的、有价值的信息是情报研究的一个新尝试。笔者探讨了网络新闻的文本挖掘方法,以上海世博新闻媒体网络版报道为例,进行实证研究,并对报道差异进行对比分析。本文选取香港、台湾、境外媒体华语版、上海本地媒体对世博会相关报道,基于文本挖掘、特征提取对报道内容的差异进行阐述,并得出结论。

  19. Carbon and nitrogen stable isotopes and metal concentration in food webs from a mining-impacted coastal lagoon

    Energy Technology Data Exchange (ETDEWEB)

    Marin-Guirao, Lazaro [Departamento de Ecologia e Hidrologia, Facultad de Biologia, Universidad de Murcia, 30100-Murcia (Spain)], E-mail: lamarin@um.es; Lloret, Javier; Marin, Arnaldo [Departamento de Ecologia e Hidrologia, Facultad de Biologia, Universidad de Murcia, 30100-Murcia (Spain)

    2008-04-01

    Two food webs from the Mar Menor coastal lagoon, differing in the distance from the desert-stream through which mining wastes were discharged, were examined by reference to essential (Zn and Cu) and non-essential (Pb and Cd) metal concentrations and stable isotopes content (C and N). The partial extraction technique applied, which reflects the availability of metals to organisms after sediment ingestion, showed higher bioavailable metal concentrations in sediments from the station influenced by the mining discharges, in agreement with the higher metal concentrations observed in organisms, which in many cases exceeded the regulatory limits established in Spanish legislation concerning seafood. Spatial differences in essential metal concentrations in the fauna suggest that several organisms are exposed to metal levels above their regulation capacity. Differences in isotopic composition were found between both food webs, the wadi-influenced station showing higher {delta}{sup 15}N values and lower {delta}{sup 13}C levels, due to the discharge of urban waste waters and by the entrance of freshwater and allochthonous marsh plants. The linear-regressions between trophic levels (as indicated by {delta}{sup 15}N) and the metal content indicated that biomagnification does not occur. In the case of invertebrates, since the 'handle strategy' of the species and the physiological requirements of the organisms, among other factors, determine the final concentration of a specific element, no clear relationships between trophic level and the metal content are to be expected. For their part, fish communities did not show clear patterns in the case of any of the analyzed metals, probably because most fish species have similar metal requirements, and because biological factors also intervened. Finally, since the study deals with metals, assumptions concerning trophic transfer factors calculation may not be suitable since the metal burden originates not only from the prey but

  20. 基于半结构特征分割的 Web数据挖掘算法%Web Data Mining Algorithm Based on Semi Structure Feature Segmentation

    Institute of Scientific and Technical Information of China (English)

    杨丽萍

    2015-01-01

    提出一种基于半结构特征分割的Web数据挖掘算法。进行Web热点数据的信息流信号模型构建,对Web热点信息流进行包络特征分解,为了提高数据挖掘的纯度和抗干扰性能,采用前馈调制滤波器进行数据干扰滤波,采用半结构特征分割进行Web热点数据的特征提取,实现数据挖掘算法改进。仿真结果表明,采用该算法能提高对Web数据特征的检测性性能,数据挖掘中受到的旁瓣干扰较小,挖掘精度较高,性能优于传统算法。%A Web data mining algorithm based on semi structure feature segmentation is proposed .The information stream signal model of Web hot date is constructed and the characteristic erwelope decomposition of Web hot information stream is finished ,in order to improve the purity of data mining and the anti‐interference performance by feedforward filter modulation data interference filter ,using semi structural feature segmentation for web hot number according to feature extraction . The data mining algorithm is realized . Simulation results show that the new algorithm can improve the detection capability of characteristics of Web data , data mining has little sidelobe interference ,mining precision is high ,performance is better than traditional algorithm .

  1. Anti Interference Mining of Web Data Stream Based on Kalman Filtering%基于Kalman滤波的Web数据流抗干扰挖掘算法

    Institute of Scientific and Technical Information of China (English)

    密海英

    2015-01-01

    提出一种基于变维Kalman滤波的Web海量数据流抗干扰挖掘算法.构建Web环境下的海量数据挖掘数据流信息模型和噪声干扰模型,结合现代信号处理方法,设计变维Kalman滤波算法进行海量数据流信号滤波预处理,把Web海量数据流映射为一组非线性宽带调频信号模型,采用信号检测算法实现Web海量数据的抗干扰挖掘.仿真结果表明,采用该算法进行Web海量数据信息的抗干扰挖掘,具有较高的数据检测精度和准确挖掘性能,具有较高的抗干扰性和鲁棒性.%An anti jamming mining algorithm for Web massive data stream based on the variable dimension Kalman filter-ing is proposed.. Construct the massive amount of data in the web data mining information flow model and noise model, com-bined with modern signal processing methods to design the variable dimension Kalman filtering algorithm of massive data flow signal filtering pre processing, the web massive data flow is mapped to a set of nonlinear wideband FM signal model and uses the signal detection algorithm is to achieve a large amount of Web data anti-interference mining. Simulation re-sults show that by using the algorithm of Web data information of magnanimity anti-interference mining, it has higher preci-sion of measured data and accurate mining performance and has high anti-interference and robustness.

  2. A Semantic Web-based System for Mining Genetic Mutations in Cancer Clinical Trials.

    Science.gov (United States)

    Priya, Sambhawa; Jiang, Guoqian; Dasari, Surendra; Zimmermann, Michael T; Wang, Chen; Heflin, Jeff; Chute, Christopher G

    2015-01-01

    Textual eligibility criteria in clinical trial protocols contain important information about potential clinically relevant pharmacogenomic events. Manual curation for harvesting this evidence is intractable as it is error prone and time consuming. In this paper, we develop and evaluate a Semantic Web-based system that captures and manages mutation evidences and related contextual information from cancer clinical trials. The system has 2 main components: an NLP-based annotator and a Semantic Web ontology-based annotation manager. We evaluated the performance of the annotator in terms of precision and recall. We demonstrated the usefulness of the system by conducting case studies in retrieving relevant clinical trials using a collection of mutations identified from TCGA Leukemia patients and Atlas of Genetics and Cytogenetics in Oncology and Haematology. In conclusion, our system using Semantic Web technologies provides an effective framework for extraction, annotation, standardization and management of genetic mutations in cancer clinical trials.

  3. CONSTRAINT INFORMATIVE RULES FOR GENETIC ALGORITHM-BASED WEB PAGE RECOMMENDATION SYSTEM

    Directory of Open Access Journals (Sweden)

    S. Prince Mary

    2013-01-01

    Full Text Available To predict the users navigation using web usage mining is the primary motto of the web page recommendation. Currently, researchers are trying to develop a web page recommendation using pattern mining technique. Here, we propose a technique for web page recommendation using genetic algorithm. It consists of three phases as data preparation, mining of informative rules and recommendation. The data preparation contains data preprocessing and user identification. The genetic algorithm is used to mine the informative rule. The genetic algorithm involves three processes which are calculating the fitness values, crossover and mutation. We use three different constraints as time duration, quality and recent visit to allow the process for next stage after the initial fitness calculation. We have to repeat these processes to find the best solution. To form the recommendation tree, we use the best solution which we obtain by means of genetic algorithm.

  4. Algorithm of Web Hot Data Mining Based on Structured Segmentation%基于半结构化分割的Web热点数据挖掘算法

    Institute of Scientific and Technical Information of China (English)

    阮梦黎

    2015-01-01

    随着大数据信息技术的发展,数据在线监测和数据挖掘成为计算机信息领域研究的热点。通过对Web热点数据分割挖掘,提高信息热点追踪和Web数据分类能力。传统算法采用非结构化数据挖掘算法,无法有效对Web热点数据进行准确定位和分层挖掘。提出一种基于半结构化分割的Web热点数据挖掘算法。采用半结构化数据进行特征分割,基于优秀基因位进行差分进化,使寻优曲线不断趋于平缓,在多个节点上并行的运行比较脚本,采用半结构化分割,使得Web热点特征挖掘实现自适应寻优,得到Web热点数据的分配因子,提高了挖掘性能。仿真结果表明,该算法获得了良好的效率和精度,提高了Web热点数据挖掘的自适应寻优能力。%With the development of big data information technology, online monitoring data and data mining has become a hot research field of computer information. The segmentation of Web hot data mining, improve the classification ability of information focus and Web data. Using the traditional algorithm of unstructured data mining algorithms, it is not valid for Web hot data for accurate positioning and layered mining. The paper proposed a mining algorithm Web hot data structured based on segmentation, feature segmentation using semi structured data, excellent genes are based on differential evolution, make the optimization curve tends to be gentle, parallel on multiple nodes running script, through the code makes the un⁃structured data mapped to the data block, make the data stored in the database relational data model, to get the distribution factor Web hot data, improve the mining performance.The simulation results show that the high efficiency and accuracy, it improved adaptive Web hotspot of data mining optimization ability.

  5. Mining Students' Learning Patterns and Performance in Web-Based Instruction: A Cognitive Style Approach

    Science.gov (United States)

    Chen, Sherry Y.; Liu, Xiaohui

    2011-01-01

    Personalization has been widely used in Web-based instruction (WBI). To deliver effective personalization, there is a need to understand different preferences of each student. Cognitive style has been identified as one of the most pertinent factors that affect students' learning preferences. Therefore, it is essential to investigate how learners…

  6. Web search and data mining of natural products and their bioactivities in PubChem

    OpenAIRE

    Ming, Hao; Tiejun, Cheng; Yanli, Wang; Stephen, Bryant H.

    2013-01-01

    Natural products, as major resources for drug discovery historically, are gaining more attentions recently due to the advancement in genomic sequencing and other technologies, which makes them attractive and amenable to drug candidate screening. Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost. Lately, a number of publicly accessible databases have been established to facilitate the access ...

  7. Usage Automata

    Science.gov (United States)

    Bartoletti, Massimo

    Usage automata are an extension of finite stata automata, with some additional features (e.g. parameters and guards) that improve their expressivity. Usage automata are expressive enough to model security requirements of real-world applications; at the same time, they are simple enough to be statically amenable, e.g. they can be model-checked against abstractions of program usages. We study here some foundational aspects of usage automata. In particular, we discuss about their expressive power, and about their effective use in run-time mechanisms for enforcing usage policies.

  8. Study on the Sharing Mechanism of Coal Mine Spatial Data Based on Agent and Web Service%基于Agent和Web Service的煤矿空间数据共享机制研究

    Institute of Scientific and Technical Information of China (English)

    谢娟娟; 顾寄南

    2012-01-01

    随着GIS和计算机技术的发展,煤矿企业积累了海量多源异构的空间数据.如何消除煤矿企业内的空间信息孤岛问题,实现空间数据共享,是当前煤矿所迫切需要解决的问题.讨论了煤矿异构空间数据的集成方法,创新性地通过Agent和Web Service的技术融合,实现空间数据的智能共享和高效查询,构建一个基于Agent和Web Services的煤矿空间数据集成系统.%With the development of GIS and the computer technology, coal enterprises have accumulated mass spatial data,which is multi-source and heterogeneous.How to eliminate space information islands in coal mine enterprises and realize the spatial data sharing are the problems that current coal mine need to resolve urgently.We discuss the heterogeneous data integrated method of coal enterprises, and inno-vatively coalesce Agent and Web Services to realize intelligent sharing and efficient inquiry for spatial data and structure a coal mine spatial data integration system based on Agent and Web Services.

  9. 基于公开Web Services API的外文图书采访系统的设计与实现%Design and Implementation of Foreign Language Books Collection and Mining System Based on Web Services API

    Institute of Scientific and Technical Information of China (English)

    俞小怡; 刘凡儒; 金玉玲

    2011-01-01

    Using the open Web Services interfaces provided by Rakuten, Google and Amazon, etc., we developed Foreign Books Collection and Mining System based on Web Services API, and supplied a new way of collection and processing in Japanese, English-based multilingual bibliography of foreign language books. Taking Japan Rakuten online mall as an example, we designed the operation processes and policies of online foreign language books collection and mining, and described the working principle and the core technology of metadata mining.%利用乐天、谷歌和亚马逊等知名网站所提供的公开Web Services API接口,可以开发基于WebServices API的外文图书采访系统,提供采集处理以日文、英文为主的多语种外文图书书目信息.文章以日本乐天网上商城提供的API为例,研究基于网上外文图书的采访流程和策略,介绍数据采集转换的工作原理和元数据获取的核心技术.

  10. Parallel Strands A Preliminary Investigation into Mining the Web for Bilingual Text

    CERN Document Server

    Resnik, P

    1998-01-01

    Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.

  11. Mining Genotype-Phenotype Associations from Public Knowledge Sources via Semantic Web Querying.

    Science.gov (United States)

    Kiefer, Richard C; Freimuth, Robert R; Chute, Christopher G; Pathak, Jyotishman

    2013-01-01

    Gene Wiki Plus (GeneWiki+) and the Online Mendelian Inheritance in Man (OMIM) are publicly available resources for sharing information about disease-gene and gene-SNP associations in humans. While immensely useful to the scientific community, both resources are manually curated, thereby making the data entry and publication process time-consuming, and to some degree, error-prone. To this end, this study investigates Semantic Web technologies to validate existing and potentially discover new genotype-phenotype associations in GWP and OMIM. In particular, we demonstrate the applicability of SPARQL queries for identifying associations not explicitly stated for commonly occurring chronic diseases in GWP and OMIM, and report our preliminary findings for coverage, completeness, and validity of the associations. Our results highlight the benefits of Semantic Web querying technology to validate existing disease-gene associations as well as identify novel associations although further evaluation and analysis is required before such information can be applied and used effectively.

  12. 基于关联规则的Web挖掘技术研究%Research on Web Mining Based on Association Rules

    Institute of Scientific and Technical Information of China (English)

    夏惠芬; 董卫民

    2011-01-01

    Association rules is an important area of Web mining. In order to dig out the hidden correlation among the data, the concept of association rules was introduced into the Web mining, and the user's access was expressed in the form of association rules. With the idea of Aporiori algorithm, the new Aporiori algorithm role and pattern appropriate for Web mining are presented. The results were verified in some simple webs, and a good result was obtained.%关联规则是Web挖掘中一个重要的研究领域.为了挖掘出隐藏在数据间的相互关系,将关联规则的概念引入到Web挖掘系统中,把用户的访问路径以关联规则的形式表现出来.基于Apriori算法的思想,给出了适合Web挖掘用户访问的新Apriori算法规则及其模式,最后将结果在一些较简单的网页上进行了验证,取得了较好的应用效果.

  13. A Hybrid Web Recommendation System based on the Improved Association Rule Mining Algorithm

    OpenAIRE

    Wanaskar, Ujwala; Vij, Sheetal; Mukhopadhyay, Debajyoti

    2013-01-01

    As the growing interest of web recommendation systems those are applied to deliver customized data for their users, we started working on this system. Generally the recommendation systems are divided into two major categories such as collaborative recommendation system and content based recommendation system. In case of collaborative recommen-dation systems, these try to seek out users who share same tastes that of given user as well as recommends the websites according to the liking given us...

  14. The systems genetics resource: a web application to mine global data for complex disease traits.

    Science.gov (United States)

    van Nas, Atila; Pan, Calvin; Ingram-Drake, Leslie A; Ghazalpour, Anatole; Drake, Thomas A; Sobel, Eric M; Papp, Jeanette C; Lusis, Aldons J

    2013-01-01

    The Systems Genetics Resource (SGR) (http://systems.genetics.ucla.edu) is a new open-access web application and database that contains genotypes and clinical and intermediate phenotypes from both human and mouse studies. The mouse data include studies using crosses between specific inbred strains and studies using the Hybrid Mouse Diversity Panel. SGR is designed to assist researchers studying genes and pathways contributing to complex disease traits, including obesity, diabetes, atherosclerosis, heart failure, osteoporosis, and lipoprotein metabolism. Over the next few years, we hope to add data relevant to deafness, addiction, hepatic steatosis, toxin responses, and vascular injury. The intermediate phenotypes include expression array data for a variety of tissues and cultured cells, metabolite levels, and protein levels. Pre-computed tables of genetic loci controlling intermediate and clinical phenotypes, as well as phenotype correlations, are accessed via a user-friendly web interface. The web site includes detailed protocols for all of the studies. Data from published studies are freely available; unpublished studies have restricted access during their embargo period.

  15. A Blended Web-Based Gaming Intervention on Changes in Physical Activity for Overweight and Obese Employees: Influence and Usage in an Experimental Pilot Study.

    Science.gov (United States)

    Kouwenhoven-Pasmooij, Tessa A; Robroek, Suzan Jw; Ling, Sui Wai; van Rosmalen, Joost; van Rossum, Elisabeth Fc; Burdorf, Alex; Hunink, M G Myriam

    2017-04-03

    Addressing the obesity epidemic requires the development of effective interventions aimed at increasing physical activity (PA). eHealth interventions with the use of accelerometers and gaming elements, such as rewarding or social bonding, seem promising. These eHealth elements, blended with face-to-face contacts, have the potential to help people adopt and maintain a physically active lifestyle. The aim of this study was to assess the influence and usage of a blended Web-based gaming intervention on PA, body mass index (BMI), and waist circumference among overweight and obese employees. In an uncontrolled before-after study, we observed 52 health care employees with BMI more than 25 kg/m(2), who were recruited via the company's intranet and who voluntarily participated in a 23-week Web-based gaming intervention, supplemented (blended) with non-eHealth components. These non-eHealth components were an individual session with an occupational health physician involving motivational interviewing and 5 multidisciplinary group sessions. The game was played by teams in 5 time periods, aiming to gain points by being physically active, as measured by an accelerometer. Data were collected in 2014 and 2015. Primary outcome was PA, defined as length of time at MET (metabolic equivalent task) ≥3, as measured by the accelerometer during the game. Secondary outcomes were reductions in BMI and waist circumference, measured at baseline and 10 and 23 weeks after the start of the program. Gaming elements such as "compliance" with the game (ie, days of accelerometer wear), "engagement" with the game (ie, frequency of reaching a personal monthly target), and "eHealth teams" (ie, social influence of eHealth teams) were measured as potential determinants of the outcomes. Linear mixed models were used to evaluate the effects on all outcome measures. The mean age of participants was 48.1 years; most participants were female (42/51, 82%). The mean PA was 86 minutes per day, ranging from 6

  16. WWW portal usage analysis using genetic algorithms

    Directory of Open Access Journals (Sweden)

    Ondřej Popelka

    2009-01-01

    Full Text Available The article proposes a new method suitable for advanced analysis of web portal visits. This is part of retrieving information and knowledge from web usage data (web usage mining. Such information is necessary in order to gain better insight into visitor’s needs and generally consumer behaviour. By le­ve­ra­ging this information a company can optimize the organization of its internet presentations and offer a better end-user experience. The proposed approach is using Grammatical evolution which is computational method based on genetic algorithms. Grammatical evolution is using a context-free grammar in order to generate the solution in arbitrary reusable form. This allows us to describe visitors’ behaviour in different manners depending on desired further processing. In this article we use description with a procedural programming language. Web server access log files are used as source data.The extraction of behaviour patterns can currently be solved using statistical analysis – specifically sequential analysis based methods. Our objective is to develop an alternative algorithm.The article further describes the basic algorithms of two-level grammatical evolution; this involves basic Grammatical Evolution and Differential Evolution, which forms the second phase of the computation. Grammatical evolution is used to generate the basic structure of the solution – in form of a part of application code. Differential evolution is used to find optimal parameters for this solution – the specific pages visited by a random visitor. The grammar used to conduct experiments is described along with explanations of the links to the actual implementation of the algorithm. Furthermore the fitness function is described and reasons which yield to its’ current shape. Finally the process of analyzing and filtering the raw input data is described as it is vital part in obtaining reasonable results.

  17. Integration of Agents and Data Mining in Interactive Web Environment for Psychometric Diagnostics

    Science.gov (United States)

    Ilić, Velibor

    Information technologies are intensively used in modern psychometric. Interactive environment for psychometrics diagnostics enables evaluation of cognitive capabilities using several multimedial tests, collecting information about users, organizing this information in user's personal profiles, visualization, interpretation and analysis of tests results, control over procedure of testing and making conclusions on collected data. Agents supervise user's actions in the interactive environment and they are trying to adjust questionnaires, diagnostic tests, training programs and other integrated tools to user's personal needs making this environment easier for use. Interactive environment contains agents for helping users in process registration, agents for guiding users trough process of diagnostics and training, and agents for helping psychologists in their activities on this system. Internet environment that contains diagnostic tests and questionnaires generates large volumes of data that should be processed. Data mining is integrated in interactive environment for diagnostic of cognitive functions and it's used for searching of potentially interesting information that this data contains. Agents use data mining system to make their decisions more precise.

  18. Two-step web-mining approach to study geology/geophysics-related open-source software projects

    Science.gov (United States)

    Behrends, Knut; Conze, Ronald

    2013-04-01

    Geology/geophysics is a highly interdisciplinary science, overlapping with, for instance, physics, biology and chemistry. In today's software-intensive work environments, geoscientists often encounter new open-source software from scientific fields that are only remotely related to the own field of expertise. We show how web-mining techniques can help to carry out systematic discovery and evaluation of such software. In a first step, we downloaded ~500 abstracts (each consisting of ~1 kb UTF-8 text) from agu-fm12.abstractcentral.com. This web site hosts the abstracts of all publications presented at AGU Fall Meeting 2012, the world's largest annual geology/geophysics conference. All abstracts belonged to the category "Earth and Space Science Informatics", an interdisciplinary label cross-cutting many disciplines such as "deep biosphere", "atmospheric research", and "mineral physics". Each publication was represented by a highly structured record with ~20 short data attributes, the largest authorship-record being the unstructured "abstract" field. We processed texts of the abstracts with the statistics software "R" to calculate a corpus and a term-document matrix. Using R package "tm", we applied text-mining techniques to filter data and develop hypotheses about software-development activities happening in various geology/geophysics fields. Analyzing the term-document matrix with basic techniques (e.g., word frequencies, co-occurences, weighting) as well as more complex methods (clustering, classification) several key pieces of information were extracted. For example, text-mining can be used to identify scientists who are also developers of open-source scientific software, and the names of their programming projects and codes can also be identified. In a second step, based on the intermediate results found by processing the conference-abstracts, any new hypotheses can be tested in another webmining subproject: by merging the dataset with open data from github

  19. Response of dandelion (Taraxacum officinale Web) to heavy metals from mine sites: micromorphology of leaves and roots.

    Science.gov (United States)

    Bini, Claudio; Maleci, Laura; Buffa, Gabriella; Wahsha, Mohammad; Fontana, Silvia

    2013-04-01

    Response of dandelion (Taraxacum officinale Web) to heavy metals from mine sites: micromorphology of leaves and roots. Maleci L.1 , Bini C.2, Buffa G. 2, Fontana S2., Wahsha M.3 1 - Dept of Biology, University of Florence, Italy. 2 - Dept of Environmental Sciences, Informatics and Statistics. Ca'Foscari University, Venice - Italy. 3 - Marine Science Centre - University of Jordan, Aqaba section, Jordan. Heavy metal accumulation is known to produce significant physiological and biochemical responses in vascular plants. Yet, metabolic and physiological responses of plants to heavy metal concentration can be viewed as potentially adaptive changes of the plants during stress. From this point of view, plants growing on abandoned mine sites are of particular interest, since they are genetically tolerant to high metal concentrations, and can be utilized in soil restoration. Among wild plants, the common dandelion (Taraxacum officinale Web) has received attention as bioindicator plant, and has been also suggested in remediation projects. Wild specimens of Taraxacum officinale Web, with their soil clod, were gathered from three sites with different contamination levels by heavy metals (Cd, Cr, Cu, Fe, Pb, Zn) in the abandoned Imperina Valley mine (Northeast Italy). A control plant was also gathered from a not contaminated site nearby. Plants were cultivated in pots for one year at HBF, and appeared macroscopically not affected by toxic signals (reduced growth, leaf necrosis) possibly induced by soil HM concentration. Leaves and roots taken at the same growing season were observed by LM and TEM. Light microscopy observations carried out on the leaf lamina show a clear difference in the cellular organization of not-contaminated and contaminated samples. The unpolluted samples present a well organized palisade tissue and spongy photosynthetic parenchyma. Samples from contaminated sites, instead, present a palisade parenchyma less organized, and a reduction of leaf thickness

  20. 基于Web挖掘的远程教育站点设计%The Design of Remote Education System Based on Web Mining

    Institute of Scientific and Technical Information of China (English)

    张舰

    2014-01-01

    The paper analyses the shortage of traditional remote education web system and imports the technology of web mining to solve this shortage .The most important step is to apply the web mining technology to discover the habit of student learning .With the information ,the administrator can promote or update the web system like changing the web UI or updating the scheme to suit the need of the user .%针对传统基于B/S模式的远程教育站点系统中存在的不足,提出将Web挖掘技术用于远程教育站点设计。使用Web挖掘技术发掘学习者的浏览习惯及其学习特点,促进管理员改进网站页面布局及课程设置,借此提升站点综合效益。

  1. Data mining method from time series Web data%时序Web数据挖掘方法

    Institute of Scientific and Technical Information of China (English)

    武健

    2014-01-01

    针对时序动态数据挖掘算法有限的问题,充分考虑动态数据之间的依赖性,将隐马尔可夫模型和启发式聚类策略相结合实现对时序动态数据发展变化特征及规律的挖掘。首先,基于隐马尔可夫模型将时序数据转换到似然空间,并以对称性KL( Kullback-Leibler)距离来标识似然度的大小;其次,构建对称性KL距离转移矩阵,并借助分层聚类方法实现对时序动态数据变化模式的分类。通过将该方法应用于计算机网络专业职位需求变化规律的知识发现,挖掘出职位需求变化的五类模式。%Taking the dependence of the adjacent dynamic data into consideration, this paper performed the mining of changing trend of the dynamic Web data by combining the Hidden Markov Model ( HMM ) with the hierarchical clustering method. In the first step, the original data were transformed by extension of the hidden Markov model and Symmetric Kullback-Leibler ( SKL) distance into probabilistic space. In the second step, the time series data could be clustered using hierarchical clustering method on the SKL confusion matrix. This method was verified with a mining of the changing trend using dynamic statistic data of job requirements in the major of computer network. The result shows that five dynamic change patterns of the job requirements could be discovered.

  2. Analysis of Web Proxy Logs

    Science.gov (United States)

    Fei, Bennie; Eloff, Jan; Olivier, Martin; Venter, Hein

    Network forensics involves capturing, recording and analysing network audit trails. A crucial part of network forensics is to gather evidence at the server level, proxy level and from other sources. A web proxy relays URL requests from clients to a server. Analysing web proxy logs can give unobtrusive insights to the browsing behavior of computer users and provide an overview of the Internet usage in an organisation. More importantly, in terms of network forensics, it can aid in detecting anomalous browsing behavior. This paper demonstrates the use of a self-organising map (SOM), a powerful data mining technique, in network forensics. In particular, it focuses on how a SOM can be used to analyse data gathered at the web proxy level.

  3. Mining of Web Server Logs in a Distributed Cluster Using Big Data Technologies

    Directory of Open Access Journals (Sweden)

    Savitha K

    2014-01-01

    Full Text Available Big Data is an emerging growing dataset beyond the ability of a traditional database tool. Hadoop rides the big data where the massive quantity of information is processed using cluster of commodity hardware. Web server logs are semi-structured files generated by the computer in large volume usually of flat text files. It is utilized efficiently by Mapreduce as it process one line at a time. This paper performs the session identification in log files using Hadoop in a distributed cluster. Apache Hadoop Mapreduce a data processing platform is used in pseudo distributed mode and in fully distributed mode. The framework effectively identifies the session utilized by the web surfer to recognize the unique users and pages accessed by the users. The identified session is analyzed in R to produce a statistical report based on total count of visit per day. The results are compared with non-hadoop approach a java environment, and it results in a better time efficiency, storage and processing speed of the proposed work.

  4. Mining the Web for the Voice of the Herd to Track Stock Market Bubbles

    CERN Document Server

    Gerow, Aaron

    2012-01-01

    We show that power-law analyses of financial commentaries from newspaper web-sites can be used to identify stock market bubbles, supplementing traditional volatility analyses. Using a four-year corpus of 17,713 online, finance-related articles (10M+ words) from the Financial Times, the New York Times, and the BBC, we show that week-to-week changes in power-law distributions reflect market movements of the Dow Jones Industrial Average (DJI), the FTSE-100, and the NIKKEI-225. Notably, the statistical regularities in language track the 2007 stock market bubble, showing emerging structure in the language of commentators, as progressively greater agreement arose in their positive perceptions of the market. Furthermore, during the bubble period, a marked divergence in positive language occurs as revealed by a Kullback-Leibler analysis.

  5. Web Data Mining Technology Applications in Business%Web数据挖掘技术在商务中的应用

    Institute of Scientific and Technical Information of China (English)

    张敬; 周书臣

    2011-01-01

    数据挖掘是最近几年来随着数据库技术和人工智能技术的发展与应用而出现的一种全新的信息技术。随着电脑的普的广泛应用,使Web成为获取各方面信息的最重要的途径。Web凭借着自身的卓越条件,是它占据了主要市场。本人根据个人对计算机Web的了解和有关资料的介绍通过各方面对Web数据挖掘过程,以及特点的讲述,并且着重研究了Web数据挖掘技术在当代商业中的应用,特别是电子商务方面。%Data mining is the most recent years,database technology and artificial intelligence technology with the development and application of the emergence of a new information technology.With a wide range of general computer applications,so as to obtain all aspects of Web information of the most important way.With its excellent Web condition,it occupies a major market.I have Web-based personal computers and relevant information about the introduction of Web data mining through the process of all parties concerned,and the characteristics of the talk,and focused on the technology of Web data mining in the contemporary business applications.especially e-commerce.

  6. Proposing a Framework for Exploration of Crime Data Using Web Structure and Content Mining

    Directory of Open Access Journals (Sweden)

    Amin Shahraki Moghaddam

    2013-10-01

    Full Text Available The purpose of this study is to propose a framework and implement High-level architecture of a scalable universal crawler to maintenance the reliability gap and present the evaluation process of forensic data analysis criminal suspects. In Law enforcement agencies, criminal web data provide appropriate and anonymous information. Pieces of information implemented the digital data in the forensic analysis to accused social networks but the assessment of these information pieces is so difficult. In fact, the operator manually should pull out the suitable information from the text in the website and find the links and classify them into a database structure. In consequent, the set is ready to implement a various criminal network evaluation tools for testing. As a result, this procedure is not efficient because it has many errors and the quality of obtaining the analyzed data is based on the expertise and experience of the investigator subsequently the reliability of the tests is not constant. Therefore, the better result just comes from the knowledgeable operator. The objectives of this study is to show the process of investigating the criminal suspects of forensic data analysis to maintenance the reliability gap by proposing a structure and applying High-level architecture of a scalable universal crawler.

  7. The mediating role of guanxi network and communication performance in transforming Web 2.0 technologies usage to work performance : An empirical study in China

    NARCIS (Netherlands)

    Wong, L.H.M.; Davison, R.M.; Ou, C.X.J.; Cheng, Z.; Tan, F.B.; Bunker, D.

    2014-01-01

    Motivated by both the increasing popularity of Web 2.0 technologies and the lack of empirical studies to conceptualize and validate their roles in the work place, in this research we aim to establish a research model to capture how Web 2.0 technologies can enhance individual work performance. Ground

  8. Participants, Usage, and Use Patterns of a Web-Based Intervention for the Prevention of Depression Within a Randomized Controlled Trial

    NARCIS (Netherlands)

    Kelders, S.M.; Bohlmeijer, E.T.; Gemert-Pijnen, van J.E.W.C.

    2013-01-01

    Background: Although Web-based interventions have been shown to be effective, they are not widely implemented in regular care. Nonadherence (ie, participants not following the intervention protocol) is an issue. By studying the way Web-based interventions are used and whether there are differences b

  9. 文本数据挖掘技术在Web知识库中的应用研究%The Applied Research of Text Data Mining Technology in the Web Knowledge Base

    Institute of Scientific and Technical Information of China (English)

    蔡立斌

    2012-01-01

    介绍了文本数据挖掘和知识提取的基本理论,然后分析了网络信息的检索与挖掘的特征,特别是文本挖掘、Web数据挖掘和基于内容数据挖掘与之相关联的系列问题.在此基础上,分析了Web知识库的设计、建立、文本数据挖掘和知识发现所需的理论和技术,对Web知识库系统的架构和功能模块进行分析和设计,建立了基于文本数据挖掘的Web网络知识库的模型.%This article first briefly describes the basic theory of text data mining and knowledge extraction, and then analyzes the network information retrieval and mining of feature, especially Web text mining, data mining and data mining based on content associated with the series of problems. On this basis, we analyzed theory and technology that the Web knowledge base design, build, text data mining and knowledge discovery are required, the Web knowledge base system structure and function module is analyzed and designed, based on text data mining Web network knowledge base model.

  10. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

    Directory of Open Access Journals (Sweden)

    Zhao Xuechun

    2007-02-01

    Full Text Available Abstract Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1 query and target sequence database management, (2 automated high-throughput BLAST searching, (3 indexing and searching of results, (4 filtering results online, (5 managing results of personal interest in favorite categories, (6 automated sequence annotation (such as NCBI NR and ontology-based annotation. PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results

  11. Design and implementation of data mining tools

    CERN Document Server

    Thuraisingham, Bhavani; Awad, Mamoun

    2009-01-01

    DATA MINING TECHNIQUES AND APPLICATIONS IntroductionTrendsData Mining Techniques and ApplicationsData Mining for Cyber Security: Intrusion DetectionData Mining for Web: Web Page Surfing PredictionData Mining for Multimedia: Image ClassificationOrganization of This BookNext StepsData Mining TechniquesIntroductionOverview of Data Mining Tasks and TechniquesArtificial Neural NetworksSupport Vector MachinesMarkov ModelAssociation Rule Mining (ARM)Multiclass ProblemImage MiningSummaryData Mining ApplicationsIntroductionIntrusion DetectionWeb Page Surfing PredictionImage ClassificationSummaryDATA MI

  12. The Application of Web Data Mining Technology in Network Education%浅谈Web数据挖掘技术在网络教育中的应用

    Institute of Scientific and Technical Information of China (English)

    易星

    2011-01-01

    This paper introduces the knowledge and wide application of web data mining, deeply analyzes and talks about the relative mining application among students, teachers and schools, which is the main subject of web data mining and network education, it plays an important role for improving the network education and deciding level of university, making a modem and digital studying environment.%本文主要介绍Web数据挖掘技术的相关知识及应用,并重点对Web数据挖掘技术与网络教育中主要对象即学生、教师、学校三者之间的相关挖掘应用进行了一些分析和探讨,对提升高校的网络教育管理与决策水平,营造一个现代化的、数字化学习环境,充分发挥Web数据挖掘在网络教育中的作用.

  13. Discovering diamonds under coal piles: Revealing exclusive business intelligence about online consumers through the use of Web Data Mining techniques embedded in an analytical customer relationship management framework

    Directory of Open Access Journals (Sweden)

    Myriam Ertz

    2016-02-01

    Full Text Available Web Mining has gained prominence over the last decade. This rise is concomitant with the upsurge of pure players, the multiple challenges of data deluge, the trend toward automation and integration within organization, as well as a desire for hyper segmentation. Confronted, partly or totally, with these multiple issues, companies recourse increasingly to replicate the data mining toolbox on web data. Although much is known about the technical aspect of WM, little is known about the extent to which WM actually fits within a customer relationship management system, designed at attracting and retaining the maximum amount of customers. An exploratory study involving twelve senior professionals and scholars indicated that WM is well-suited to achieve most of the customer relationship management objective, with regards to the profiling of existing web customers. The results of this study suggest that the engineering of WM processes into analytic customer relationship management systems, may yield highly beneficial returns, provided that some guidelines are scrupulously followed.

  14. A fuzzy method for improving the functionality of search engines based on user's web interactions

    Directory of Open Access Journals (Sweden)

    Farzaneh Kabirbeyk

    2015-04-01

    Full Text Available Web mining has been widely used to discover knowledge from various sources in the web. One of the important tools in web mining is mining of web user’s behavior that is considered as a way to discover the potential knowledge of web user’s interaction. Nowadays, Website personalization is regarded as a popular phenomenon among web users and it plays an important role in facilitating user access and provides information of users’ requirements based on their own interests. Extracting important features about web user behavior plays a significant role in web usage mining. Such features are page visit frequency in each session, visit duration, and dates of visiting a certain pages. This paper presents a method to predict user’s interest and to propose a list of pages based on their interests by identifying user’s behavior based on fuzzy techniques called fuzzy clustering method. Due to the user’s different interests and use of one or more interest at a time, user’s interest may belong to several clusters and fuzzy clustering provide a possible overlap. Using the resulted cluster helps extract fuzzy rules. This helps detecting user’s movement pattern and using neural network a list of suggested pages to the users is provided.

  15. The State of Wiki Usage in U.S. K-12 Schools: Leveraging Web 2.0 Data Warehouses to Study Quality and Equality in Online Learning Environments

    Science.gov (United States)

    Reich, Blair Justin Fire

    2012-01-01

    In the first part of this dissertation, I document wiki usage in U.S. K-12 settings by analyzing data on a representative sample drawn from a population of nearly 180,000 wikis. My research group, which I lead and managed, measured the opportunities wikis provide for students to develop 21st century skills such as expert thinking, complex…

  16. The State of Wiki Usage in U.S. K-12 Schools: Leveraging Web 2.0 Data Warehouses to Assess Quality and Equity in Online Learning Environments

    Science.gov (United States)

    Reich, Justin; Murnane, Richard; Willett, John

    2012-01-01

    To document wiki usage in U.S. K-12 settings, this study examined a representative sample drawn from a population of nearly 180,000 wikis. The authors measured the opportunities wikis provide for students to develop 21st-century skills such as expert thinking, complex communication, and new media literacy. The authors found four types of wiki…

  17. Zeolites and Usage Areas

    Directory of Open Access Journals (Sweden)

    Jale Gülen

    2012-06-01

    Full Text Available Zeolites are formed via several reactions from the minerals that consist of aluminium and silica. Zeolites, which have a growing significance in recent days are one of important industrial raw materials. As well as being used as a catalyst, theirability to do ion exchange and adsorption make them even more valuable. Zeolites are used in several industries such as energy, agriculture and animal husbandry, mining and metallurgy, construction, detergent, paper, etc. In this study, the definiton, formation and usage areas of zeolites are explained.

  18. Topic-Driven Web Information Mining%面向主题的WWW信息挖掘系统

    Institute of Scientific and Technical Information of China (English)

    余晨; 顾毓清

    2003-01-01

    With the explosive growth of the World-Wide Web,it is becoming increasingly difficult for users to collect and analyze Web pages that are relevant to a particular topic. In this paper,Topic-Driven Web Information Gathering system is presented,which can efficiently collects Web pages for a topic in relatively limited hardware and network resources ,and keeps the pages more up-to-date.

  19. A Comparative Study on Webometrics and Web Mining%网络计量学与Web挖掘对比研究

    Institute of Scientific and Technical Information of China (English)

    赵蓉英; 魏明坤

    2016-01-01

    Purpose/Significance] This paper makes a comparative study of the Webometrics and Web mining based on their concepts in order to show the differences and connections between them, the effort may facilitate the scholars’ future research work on Webometrics. [ Method/Process] Firstly, the author draws the knowledge mapping of Webometrics and Web mining by using the software of CiteSpace;secondly, the high-frequency keywords are identified by using the method of word frequency analysis;finally, the differences between the two fields of research are discussed by using the method of comparative analysis. [ Result/Conclusion] We find that both of the Webomet-rics and Web mining are based on the analysis of network data, with the former pays attention to the phenomenon and the structure and the latter focuses on algorithm and experiment development;as for the research subjects, Webometrics focuses on academic research and plays an important role in scientific research work, while Web mining focuses on the research of e-commerce, and orients toward commercial in-terest.%[目的/意义]以网络计量学和Web挖掘的概念为出发点,对网络计量学与Web挖掘进行对比研究,掌握两者之间的区别与联系,有利于学者对网络计量学的深入研究。[方法/过程]利用CiteSpace软件绘制网络计量学与Web挖掘领域研究热点可视化知识图谱;运用词频分析方法,统计各领域的高频关键词;最后,通过对比分析法,比较两者研究方向的差异。[结果/结论]研究发现两者都是基于对网络数据的分析,其中网络计量学注重现象与结构研究,而Web挖掘注重算法与试验性研究;在研究对象层面,网络计量学侧重于学术领域的研究,对科研领域的贡献较大,而Web挖掘侧重于电子商务领域的研究,以商业利益为导向。

  20. HC StratoMineR: A web-based tool for the rapid analysis of high content datasets

    NARCIS (Netherlands)

    Omta, W.; Heesbeen, R. van; Pagliero, R.; Velden, L. van der; Lelieveld, D.; Nellen, M.; Kramer, M.; Yeong, M.; Saeidi, A.; Medema, R.; Spruit, M.; Brinkkemper, S.; Klumperman, J.; Egan, D.

    2016-01-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that

  1. HC StratoMineR : A Web-Based Tool for the Rapid Analysis of High-Content Datasets

    NARCIS (Netherlands)

    Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

    2016-01-01

    High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that

  2. A COMPARATIVE ANALYSIS OF WEB INFORMATION EXTRACTION TECHNIQUES DEEP LEARNING vs. NAÏVE BAYES vs. BACK PROPAGATION NEURAL NETWORKS IN WEB DOCUMENT EXTRACTION

    Directory of Open Access Journals (Sweden)

    J. Sharmila

    2016-01-01

    Full Text Available Web mining related exploration is getting the chance to be more essential these days in view of the reason that a lot of information is overseen through the web. Web utilization is expanding in an uncontrolled way. A particular framework is required for controlling such extensive measure of information in the web space. Web mining is ordered into three noteworthy divisions: Web content mining, web usage mining and web structure mining. Tak-Lam Wong has proposed a web content mining methodology in the exploration with the aid of Bayesian Networks (BN. In their methodology, they were learning on separating the web data and characteristic revelation in view of the Bayesian approach. Roused from their investigation, we mean to propose a web content mining methodology, in view of a Deep Learning Algorithm. The Deep Learning Algorithm gives the interest over BN on the basis that BN is not considered in any learning architecture planning like to propose system. The main objective of this investigation is web document extraction utilizing different grouping algorithm and investigation. This work extricates the data from the web URL. This work shows three classification algorithms, Deep Learning Algorithm, Bayesian Algorithm and BPNN Algorithm. Deep Learning is a capable arrangement of strategies for learning in neural system which is connected like computer vision, speech recognition, and natural language processing and biometrics framework. Deep Learning is one of the simple classification technique and which is utilized for subset of extensive field furthermore Deep Learning has less time for classification. Naive Bayes classifiers are a group of basic probabilistic classifiers in view of applying Bayes hypothesis with concrete independence assumptions between the features. At that point the BPNN algorithm is utilized for classification. Initially training and testing dataset contains more URL. We extract the content presently from the dataset. The

  3. Web数据挖掘在校园网搜索引擎系统中的应用研究%Applied Research of Web Data Mining in Search Engine System of Campus Network

    Institute of Scientific and Technical Information of China (English)

    牛凯

    2014-01-01

    在阐述了Web数据挖掘的分类、Web数据挖掘的方法和Web数据挖掘具体过程的基础上,设计了校园网搜索引擎系统的整体架构,论述了搜索引擎系统主要功能模块设计,提出了Web数据挖掘技术在校园网搜索引擎系统中的应用。%Based on the elaboration of the classification of Web Data Mining and the Method of Web Data Mining,the specific process is discussed in the paper,And the architecture of campus network search engine system,is designed the main module of Search engine system is discussed,Finally the application of Web data mining technology in campus network search engine.

  4. IPACT: Improved Web Page Recommendation System Using Profile Aggregation Based On Clustering of Transactions

    Directory of Open Access Journals (Sweden)

    Yahya AlMurtadha

    2011-01-01

    Full Text Available Problem statement: Recently, Web usage mining techniques have been widely used to build recommendation systems especially for anonymous users. Approach: Assigning the current user to the best web navigation profile with similar navigation activities will improve the ability of the prediction engine to produce a recommendation list then introduce it to the user. This study presents iPACT an improved recommendation system using Profile Aggregation based on Clustering of Transactions (PACT. Results: iPACT shows better prediction accuracy than the previous methods PACT and Hypergraph. Conclusion: The users interests change over time; hence an incremental and adaptive web navigation profiling is a key feature for the future works.

  5. Optimizing Web Sites for Customer Retention

    CERN Document Server

    Hahsler, Michael

    2008-01-01

    With customer relationship management (CRM) companies move away from a mainly product-centered view to a customer-centered view. Resulting from this change, the effective management of how to keep contact with customers throughout different channels is one of the key success factors in today's business world. Company Web sites have evolved in many industries into an extremely important channel through which customers can be attracted and retained. To analyze and optimize this channel, accurate models of how customers browse through the Web site and what information within the site they repeatedly view are crucial. Typically, data mining techniques are used for this purpose. However, there already exist numerous models developed in marketing research for traditional channels which could also prove valuable to understanding this new channel. In this paper we propose the application of an extension of the Logarithmic Series Distribution (LSD) model repeat-usage of Web-based information and thus to analyze and op...

  6. Usage of a generic web-based self-management intervention for breast cancer survivors: substudy analysis of the BREATH trial

    NARCIS (Netherlands)

    Berg, S.W. van den; Peters, E.J.; Kraaijeveld, J.F.; Gielissen, M.F.M.; Prins, J.B.

    2013-01-01

    BACKGROUND: Generic fully automated Web-based self-management interventions are upcoming, for example, for the growing number of breast cancer survivors. It is hypothesized that the use of these interventions is more individualized and that users apply a large amount of self-tailoring. However,

  7. A Fraud-click Detecting Model Based on Web Log Mining%基于Web日志挖掘的欺骗点击检测模型

    Institute of Scientific and Technical Information of China (English)

    崔宏娟; 康慕宁

    2011-01-01

    伴随着网络广告的高点击率而来的是大量的欺骗点击,如何准确识别出欺骗点击、保证用户的收益成为网络广告的当务之急.通过分析常用欺骗点击的手段、传统检测方法以及Web日志挖掘特点,提出了一个利用Web日志挖掘进行欺骗点击检测的模型.对该模型各个模块的实现方法进行了详细的阐述.实验表明,该模型简单高效,可以有效地检测出网络广告的欺骗点击.%There are a lot of fraud clicks following the high click rate of online advertising. How to accurately identify fraud clicks to ensure the user's online advertising revenue is becoming the top priority. By analyzing the means and features of fraud click and characteristics of Web log mining, a fraud clicks detecting model is proposed based on Web log mining. The realization of each module is described in detail. Experiments show that the model is simple and efficient, it can effectively detect fraudulent clicks.

  8. Analysis on Recommended System for Web Information Retrieval Using HMM

    Directory of Open Access Journals (Sweden)

    Himangni Rathore

    2014-11-01

    Full Text Available Web is a rich domain of data and knowledge, which is spread over the world in unstructured manner. The number of users is continuously access the information over the internet. Web mining is an application of data mining where web related data is extracted and manipulated for extracting knowledge. The data mining is used in the domain of web information mining is refers as web mining, that is further divided into three major domains web uses mining, web content mining and web structure mining. The proposed work is intended to work with web uses mining. The concept of web mining is to improve the user feedbacks and user navigation pattern discovery for a CRM system. Finally a new algorithm HMM is used for finding the pattern in data, which method promises to provide much accurate recommendation.

  9. 基于Web日志挖掘技术的农业信息网站构建%Research on Building Agricultural Information Website Based on Web Log Mining Technology

    Institute of Scientific and Technical Information of China (English)

    孙福振; 李艳; 李业刚

    2009-01-01

    详细介绍了Web日志挖掘技术,并提出一个基于Web日志挖掘的应用模型,以期为指导农业信息网站的改进和构建提供科学指导.%The web log mining technology was introduced in detail, and an application model based on web log mining technology was put forward,in order to provide a scientific basis for the improvement and building of agricultural information websites.

  10. Altmetrics, PIRUS and Usage Factor

    Directory of Open Access Journals (Sweden)

    Peter Shepherd

    2013-11-01

    Full Text Available Scholars have moved their publications onto the web, and the ongoing conversation around the outputs of research increasingly takes place there. Beyond the research community itself, scholarly information has an impact on other professionals, as well as on the general public. Traditional measures do not reflect these wider impacts. The mission of COUNTER is to set and monitor global standards for the measurement of online usage of content. Usage is an important measure of the impact and value of publications, and as such has a role in altmetrics. Usage can be reported at the individual item and individual researcher level and aggregated to the journal or institution level. PIRUS and Usage Factor are two COUNTER-lead initiatives that are based on this approach, with the potential to provide useful altmetrics.

  11. LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes.

    Science.gov (United States)

    Cañada, Andres; Capella-Gutierrez, Salvador; Rabal, Obdulia; Oyarzabal, Julen; Valencia, Alfonso; Krallinger, Martin

    2017-05-22

    A considerable effort has been devoted to retrieve systematically information for genes and proteins as well as relationships between them. Despite the importance of chemical compounds and drugs as a central bio-entity in pharmacological and biological research, only a limited number of freely available chemical text-mining/search engine technologies are currently accessible. Here we present LimTox (Literature Mining for Toxicology), a web-based online biomedical search tool with special focus on adverse hepatobiliary reactions. It integrates a range of text mining, named entity recognition and information extraction components. LimTox relies on machine-learning, rule-based, pattern-based and term lookup strategies. This system processes scientific abstracts, a set of full text articles and medical agency assessment reports. Although the main focus of LimTox is on adverse liver events, it enables also basic searches for other organ level toxicity associations (nephrotoxicity, cardiotoxicity, thyrotoxicity and phospholipidosis). This tool supports specialized search queries for: chemical compounds/drugs, genes (with additional emphasis on key enzymes in drug metabolism, namely P450 cytochromes-CYPs) and biochemical liver markers. The LimTox website is free and open to all users and there is no login requirement. LimTox can be accessed at: http://limtox.bioinfo.cnio.es. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A Research on Continuance Usage Intention of the Interactive Academic Web Portals%学术门户网站持续使用意向影响因素研究

    Institute of Scientific and Technical Information of China (English)

    张树娟; 韩阳阳; 万腾飞

    2014-01-01

    In this paper, we developed an integrative model which based on the Theory of Planned Behavior (TPB) and the Expectation Confirm Theory (ECT), took the user experience as starting point, through analyzing the interactive academic portal using group, we put forward the continuance usage intention model of the interactive academic web portals. After the empirical test, Perceived Usefulness, Perceived Ease, Perception of Affiliation, Interactivity, Network Externalities are found to have substantial influence on the continuous usage attitude; Satisfaction, Self-efficacy and Attitude had a significant impact on the user's intention except the subjective norms. Finally, according the results, we offered some suggestions for the academic web portals’ operation and development.%本文以计划行为理论(Theory of Planned Behavior, TPB)和期望-确认理论模型(Expectation Confirmation Theory, ECT)为基础,以用户体验为出发点,通过对交互式学术门户网站使用群体的分析,提出了交互式学术门户网站的持续使用模型假设并进行实证检验,得出有用感知、易用感知、归属感知、交互性、网络外部性均正向显著影响网站的持续使用态度;满意度、自我效能和态度正向显著影响用户的持续使用意向,主观规范对持续使用意向的影响不显著。最后结合模型验证结果对我国学术门户网站的建设和运行提出几点建议。

  13. UkrVO astronomical WEB services

    Science.gov (United States)

    Mazhaev, O. E.

    2017-02-01

    Ukraine Virtual Observatory (UkrVO) has been a member of the International Virtual Observatory Alliance (IVOA) since 2011. The virtual observatory (VO) is not a magic solution to all problems of data storing and processing, but it provides certain standards for building infrastructure of astronomical data center. The astronomical databases help data mining and offer to users an easy access to observation metadata, images within celestial sphere and results of image processing. The astronomical web services (AWS) of UkrVO give to users handy tools for data selection from large astronomical catalogues for a relatively small region of interest in the sky. Examples of the AWS usage are showed.

  14. Web日志的高效多能挖掘算法%AN EFFICIENT AND MULTI-PURPOSE ALGORITHM FOR MINING WEB LOGS

    Institute of Scientific and Technical Information of China (English)

    宋擒豹; 沈钧毅

    2001-01-01

    Similar customer groups, relevant Web pages, and frequent accesspaths can be discovered by analyzing of Web log files and customer database. In this paper, novel Web log mining algorithms are presented. First, according to Web site's directed graph defined, a URL-UserID relevant matrix is set up, where URL is taken as row and UserID is taken as column, and each element's value of this matrix is the user's hits. Second, similar customer groups are discovered by measuring similarity between column vectors, and relevant Web pages are obtained by measuring similarity between row vectors; frequent access paths can also be discovered by further processing of the latter. Experiments show the effectiveness of the algorithms.%通过对Web服务器日志文件和客户交易数据进行分析,可以发现相似客户群体、相关Web页面和频繁访问路径.提出了一种新颖的Web日志挖掘算法.在该算法中,首先以Web站点URL为行、以UserID为列建立URL-UserID关联矩阵,元素值为用户的访问次数,然后,对列向量进行相似性分析得到相似客户群体,对行向量进行相似性度量获得相关Web页面,对后者再进一步处理还可以发现频繁访问路径.实验结果表明了算法的有效性.

  15. Are Mental Health Effects of Internet Use Attributable to the Web-Based Content or Perceived Consequences of Usage? A Longitudinal Study of European Adolescents.

    Science.gov (United States)

    Hökby, Sebastian; Hadlaczky, Gergö; Westerlund, Joakim; Wasserman, Danuta; Balazs, Judit; Germanavicius, Arunas; Machín, Núria; Meszaros, Gergely; Sarchiapone, Marco; Värnik, Airi; Varnik, Peeter; Westerlund, Michael; Carli, Vladimir

    2016-07-13

    Adolescents and young adults are among the most frequent Internet users, and accumulating evidence suggests that their Internet behaviors might affect their mental health. Internet use may impact mental health because certain Web-based content could be distressing. It is also possible that excessive use, regardless of content, produces negative consequences, such as neglect of protective offline activities. The objective of this study was to assess how mental health is associated with (1) the time spent on the Internet, (2) the time spent on different Web-based activities (social media use, gaming, gambling, pornography use, school work, newsreading, and targeted information searches), and (3) the perceived consequences of engaging in those activities. A random sample of 2286 adolescents was recruited from state schools in Estonia, Hungary, Italy, Lithuania, Spain, Sweden, and the United Kingdom. Questionnaire data comprising Internet behaviors and mental health variables were collected and analyzed cross-sectionally and were followed up after 4 months. Cross-sectionally, both the time spent on the Internet and the relative time spent on various activities predicted mental health (Peffects that were not fully accounted for by perceived consequences. The longitudinal analyses showed that sleep loss due to Internet use (ß=.12, 95% CI=0.05-0.19, P=.001) and withdrawal (negative mood) when Internet could not be accessed (ß=.09, 95% CI=0.03-0.16, Peffect on mental health in the long term. Perceived positive consequences of Internet use did not seem to be associated with mental health at all. The magnitude of Internet use is negatively associated with mental health in general, but specific Web-based activities differ in how consistently, how much, and in what direction they affect mental health. Consequences of Internet use (especially sleep loss and withdrawal when Internet cannot be accessed) seem to predict mental health outcomes to a greater extent than the

  16. Web Usage Knowledge Discovery Model and Its Applications Based on Ontology%基于本体的Web使用知识发现模型及应用

    Institute of Scientific and Technical Information of China (English)

    何丽; 严冬梅; 韩文秀

    2006-01-01

    本体在Web上的应用能够有效解决Web信息共享的语义问题.该文提出了基于Web本体和服务器日志文件的知识发现模型,主要讨论了用户访问行为的表示、语义用户分布的定义及发现算法.最后介绍了Web使用知识发现模型在Web个性化系统中的应用.

  17. 大数据下的Web数据集成与挖掘研究%Research on the Integration and Mining of Web Data Under Big Data

    Institute of Scientific and Technical Information of China (English)

    张素智; 孙嘉彬; 王威

    2014-01-01

    随着Web 2.0技术的快速发展,社交网络、物联网、移动互联网等新兴服务行业日益涌现,Web数据呈爆炸式增长,成为炙手可热的“大数据”。 Web大数据巨大的价值使得越来越多的人开始关注,如何获取Web数据并进行挖掘利用。在大数据的环境下,Web数据呈现出规模大、种类多、数据流高速性等特点,使得Web数据抽取与集成,数据分析,数据解释等方面的研究更加深入,与此同时,Web大数据的集成与挖掘仍存在着数据规模、数据多样性、数据时效性、隐私保护等方面的挑战。%With the rapid development of technologies about Web 2.0, new service such as social network, internet of things, mobile networks in-creasingly come to the fore. Web data explosively growth and become the hot big data. Because of the tremendous value of big data, more and more people begin to pay attention to obtain and mine it. Discusses the concept of big data, takes this as a springboard, analyzes the extraction and integration of Web data, data analysis, data interpretation. And summarizes some new challenges in the future.

  18. 浏览行为数据在Web用法挖掘中的应用%The Application of Browsing Action Data in Web Usage Mining

    Institute of Scientific and Technical Information of China (English)

    杨凡丁; 刘建平; 严奉华

    2008-01-01

    网站用户的浏览行为的不确定性导致从现有的Web日志文件挖掘出的事务模式来预测用户行为越来越困难.引入一种新的数据类型浏览行为数据(BAD)用于提高Web用法挖掘的质量,BAD是一种特殊的浏览数据,如"复制","滚动","另存为",并且没有被日志文件所记录,并给出了BAD的定义.为了象Web日志文件一样记录BAD,介绍并使用一种现有的在线数据收集模块来捕获用户BAD.通过一个电子商务程序的实例说明BAD能增加现有Web事务挖掘算法的有效性.

  19. Web使用模式挖掘中的几个关键问题研究%Research on Some Important Problems of Web Usage Mode Mining

    Institute of Scientific and Technical Information of China (English)

    王玉珍

    2003-01-01

    Web使用模式挖掘是数据挖掘技术在Web领域的应用.介绍了Web使用模式挖掘的基本概况,重点讨论了Web使用模式挖掘过程中的几个关键问题,即源数据的收集与集成,挖掘方法的不断更新及Web使用模式分析等问题.

  20. 基于本体与Web挖掘的企业网上信任危机预防模型研究%Research on the Model of Enterprise Online Trust Crisis Prevention Based on Ontology and Web Mining

    Institute of Scientific and Technical Information of China (English)

    谭春辉; 王晓

    2011-01-01

    With the coming of network era, enterprise online trust crisis has become normal, bringing giant negative effects to enterprise's operation.It will improve the performance of enterprise's online trust crisis prevention by use of ontology and Web mining. After established the Web mining process based on ontology, this paper has designed the enterpriseJs online trust crisis prevention model based on ontology and Web mining, analyzed the operating principles of the model, and discussed the construction of the mining objects ontology and the mining methods ontology.%网络时代的到来,使企业网上信任危机成为企业的一种常态,给企业运营带来巨大的负面影响。将本体与Web挖掘应用于企业网上信任危机预防,可以提升企业绩效。本文在建立基于本体的Web数据挖掘过程的基础上,设计了基于本体和Web挖掘的企业网上信任危机预防模型,分析了该模型的工作过程,并对挖掘对象本体的构建、挖掘方法本体的建立进行了探讨。

  1. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Science.gov (United States)

    Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.

    2016-12-01

    Big oceanographic data have been produced, archived and made available online, but finding the right data for scientific research and application development is still a significant challenge. A long-standing problem in data discovery is how to find the interrelationships between keywords and data, as well as the intrarelationships of the two individually. Most previous research attempted to solve this problem by building domain-specific ontology either manually or through automatic machine learning techniques. The former is costly, labor intensive and hard to keep up-to-date, while the latter is prone to noise and may be difficult for human to understand. Large-scale user behavior data modelling represents a largely untapped, unique, and valuable source for discovering semantic relationships among domain-specific vocabulary. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, user behaviors, and existing ontology. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the proposed search engine helps both scientists and general users search with better ranking results, recommendation, and ontology navigation.

  2. Prediction of users webpage access behaviour using association rule mining

    Indian Academy of Sciences (India)

    R Geetharamani; P Revathy; Shomona G Jacob

    2015-12-01

    Web Usage mining is a technique used to identify the user needs from the web log. Discovering hidden patterns from the logs is an upcoming research area. Association rules play an important role in many web mining applications to detect interesting patterns. However, it generates enormous rules that cause researchers to spend ample time and expertise to discover the really interesting ones. This paper works on the server logs from the MSNBC dataset for the month of September 1999. This research aims at predicting the probable subsequent page in the usage of web pages listed in this data based on their navigating behaviour by using Apriori prefix tree (PT) algorithm. The generated rules were ranked based on the support, confidence and lift evaluation measures. The final predictions revealed that the interestingness of pages mainly depended on the support and lift measure whereas confidence assumed a uniform value among all the pages. It proved that the system guaranteed 100% confidence with the support of 1.3E−05. It revealed that the pages such as Front page, On-air, News, Sports and BBS attracted more interested subsequent users compared to Travel, MSN-News and MSN-Sports which were of less interest.

  3. Technologies for Decreasing Mining Losses

    Science.gov (United States)

    Valgma, Ingo; Väizene, Vivika; Kolats, Margit; Saarnak, Martin

    2013-12-01

    In case of stratified deposits like oil shale deposit in Estonia, mining losses depend on mining technologies. Current research focuses on extraction and separation possibilities of mineral resources. Selective mining, selective crushing and separation tests have been performed, showing possibilities of decreasing mining losses. Rock crushing and screening process simulations were used for optimizing rock fractions. In addition mine backfilling, fine separation, and optimized drilling and blasting have been analyzed. All tested methods show potential and depend on mineral usage. Usage in addition depends on the utilization technology. The questions like stability of the material flow and influences of the quality fluctuations to the final yield are raised.

  4. Web Mining Based on Hybrid Simulated Annealing Genetic Algorithm and HMM%基于混合模拟退火-遗传算法和HMM的Web挖掘

    Institute of Scientific and Technical Information of China (English)

    邹腊梅; 龚向坚

    2012-01-01

    The training algorithm which is used to training HMM is a sub-optimal algorithm and sensitive to initial parameters. Typical hidden Markov model often leads to sub-optimal when training it with random parameters. It is ineffective when mining Web information with typical HMM. GA has the excellent ability of global searching and has the defect of slow convergence rate. SA has the excellent ability of local searching and has the defect of randomly roaming. It combines the advantages of genetic algorithm and simulated annealing algorithm .proposes hybrid simulated annealing genetic algorithm (SGA). SGA chooses the best SGA parameters by experiment and optimizes HMM combining Baum-Welch during the course of Web mining. The experimental results show that the SGA significantly improves the performance in precision and recall.%隐马尔可夫模型训练算法是一种局部搜索算法,对初值敏感.传统方法采用随机参数训练隐马尔可夫模型时常陷入局部最优,应用于Web挖掘效果不佳.遗传算法具有较强的全局搜索能力,但容易早熟、收敛慢,模拟退火算法具有较强的局部寻优能力,但会随机漫游,全局搜索能力欠缺.综合考虑遗传算法和模拟退火算法的特点,提出混合模拟退火-遗传算法SGA,优化HMM初始参数,弥补Baum-Welch算法对初始参数敏感的缺陷,Web挖掘的实验结果表明五个域提取的REC和PRE都有明显的提高.

  5. 网络日志预处理中优化的会话识别算法%Research on Method for Session Identification in Web Log Mining

    Institute of Scientific and Technical Information of China (English)

    杨富华

    2011-01-01

    Data preproeessing on the network log mining is very crucial and the results will have a direct impact on the quality of network log mining. The traditional network log mining data preprocessing is not suitable for identifyiog the characteristics of web logs because the threshold is fixed, which leads to low efficiency and low accuracy of data mining. To improve the efficiency of data preprocessing, an improved conversational identification method is put forward, based on web log data preprocessing of identifying. The threshold is adjusted according to the page importance based on the site structure and the page content, then the uninterested pages will be delete. FinaLly the simulafion experiments is carried out. The experimental results show that the proposed method can decide the access time threshold more accumtoly compared with the traditional network log mining data preprocessing methods, and improve the efficiency of data preprocessing and the precision of mining results.%研究网络日志预处理中会话识别问题,会话识别是网络日志数据预处理中最蘑要的一个环节.为使用户准确快速地找到需要的资源,传统网络日志预处理方法采用固定阈值会话识别算法,不能适合网络日志的动态性和不能很好消除网络日志中的冗余信息,导致后继网络日志数据挖掘效率和挖掘精度低.为更好消除网络日志冗余信息,提高后继数据挖掘的效率和精度,提出一种改进的网络日志预处理会话识别算法.新算法可根据页面内容、站点结构确定页面重要程度,对阈值进行动态调整,克服传统固定阈值缺陷,根据用户对页面内容的兴趣度删除不感兴趣页面,消除冗余信息,并对该算法进行了仿真.结果表明,相对于传统网络预测的会话识别算法,新算法能更准确地确定页面访问时间阈值,消除了网络日志冗余信息,提高了网络日志预处理效率和数据挖掘精度.

  6. ISI Web of Knowledge新增分析工具Results Analysis探讨 --BIOSIS Previews检索实践%Results Analysis of ISI Web of Knowledge: its' usage introduction

    Institute of Scientific and Technical Information of China (English)

    张轶群

    2005-01-01

    针对检索结果的信息分析和信息挖掘的分析工具Results Analysis是ISI Web of Knowledge于2004年新增的服务功能.以BIOSIS Previews为检索实例,详细探讨ResultsAnalysis分析工具的使用方法和功能,并提出改进的建议.

  7. Excavando la web

    OpenAIRE

    Ricardo, Baeza-Yates

    2004-01-01

    The web is the internet's most important phenomenon, as demonstrated by its exponential growth and diversity. Hence, due to the volume and wealth of its data, search engines have become among the web's main tools. They are useful when we know what we are looking for. However, certainly the web holds answers to questions never imagined. The process of finding relations or interesting patterns within a data set is called "data mining" and in the case of the web, "web mining". In this article...

  8. The first metatarsal web space:its applied anatomy and usage in tracing the first dorsal metatarsal artery in thumb reconstruction

    Institute of Scientific and Technical Information of China (English)

    徐永清; 李军; 钟世镇; 徐达传; 徐小山; 郭远发; 汪新民; 李主一; 朱跃良

    2004-01-01

    Objective: To clarify the anatomical relationship of the structures in the first toe webbing space for better dissection of toes in thumb reconstruction.Methods: The first dorsal metatarsal artery, the first deep transverse metatarsal ligament and the extensor expansion were observed on 42 adult cadaveric lower extremities. Clinically the method of tracing the first dorsal metatarsal artery around the space of the extensor expansion was used in 36 cases of thumb reconstruction.Results: The distal segments of the first dorsal metatarsal artery of Gilbert types I and II were located superficially to the extensor expansion. The harvesting time of a toe was shortened from 90 minutes to 50 minutes with 100% survival of reconstructed fingers. Conclusions: The distal segment of the first dorsal metatarsal artery lies constantly at the superficial layer of the extensor expansion. Most of the first metatarsal arteries of Gilbert types I and II can be easily located via the combined sequential and reverse dissection around the space of the extensor expansion.

  9. ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

    Science.gov (United States)

    2012-01-01

    Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols. PMID:22595088

  10. Mining Related Articles for Automatic Journal Cataloging

    Directory of Open Access Journals (Sweden)

    Yuqing Mao

    2016-06-01

    Full Text Available Purpose: This paper is an investigation of the effectiveness of the method of clustering biomedical journals through mining the content similarity of journal articles. Design/methodology/approach: 3,265 journals in PubMed are analyzed based on article content similarity and Web usage, respectively. Comparisons of the two analysis approaches and a citation-based approach are given. Findings: Our results suggest that article content similarity is useful for clustering biomedical journals, and the content-similarity-based journal clustering method is more robust and less subject to human factors compared with the usage-based approach and the citation-based approach. Research limitations: Our paper currently focuses on clustering journals in the biomedical domain because there are a large volume of freely available resources such as PubMed and MeSH in this field. Further investigation is needed to improve this approach to fit journals in other domains. Practical implications: Our results show that it is feasible to catalog biomedical journals by mining the article content similarity. This work is also significant in serving practical needs in research portfolio analysis. Originality/value: To the best of our knowledge, we are among the first to report on clustering journals in the biomedical field through mining the article content similarity. This method can be integrated with existing approaches to create a new paradigm for future studies of journal clustering.

  11. The Application of Web Mining in the Web Shopping%Web挖掘在网上购物中的应用研究

    Institute of Scientific and Technical Information of China (English)

    叶彩虹

    2004-01-01

    Internet的迅速发展,使得World Wide Web 已成为一个巨大的信息资源库,为Web挖掘研究提供了丰富的信息资源,同时也提出了新的挑战.该文首先概述了数据挖掘和Web挖掘的概念、挖掘流程及算法,然后介绍了电子商务及网上购物的概念及现状,并结合具体实例讨论分析了Web挖掘在网上购物中的应用.

  12. Commercial Data Mining Software

    Science.gov (United States)

    Zhang, Qingyu; Segall, Richard S.

    This chapter discusses selected commercial software for data mining, supercomputing data mining, text mining, and web mining. The selected software are compared with their features and also applied to available data sets. The software for data mining are SAS Enterprise Miner, Megaputer PolyAnalyst 5.0, PASW (formerly SPSS Clementine), IBM Intelligent Miner, and BioDiscovery GeneSight. The software for supercomputing are Avizo by Visualization Science Group and JMP Genomics from SAS Institute. The software for text mining are SAS Text Miner and Megaputer PolyAnalyst 5.0. The software for web mining are Megaputer PolyAnalyst and SPSS Clementine . Background on related literature and software are presented. Screen shots of each of the selected software are presented, as are conclusions and future directions.

  13. EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    Directory of Open Access Journals (Sweden)

    Nuez Fernando

    2008-01-01

    Full Text Available Abstract Background Expressed sequence tag (EST collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. Results We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. Conclusion The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http

  14. A web-based laboratory information system to improve quality of care of tuberculosis patients in Peru: functional requirements, implementation and usage statistics

    Directory of Open Access Journals (Sweden)

    Yale Gloria

    2007-10-01

    Full Text Available Abstract Background Multi-drug resistant tuberculosis patients in resource-poor settings experience large delays in starting appropriate treatment and may not be monitored appropriately due to an overburdened laboratory system, delays in communication of results, and missing or error-prone laboratory data. The objective of this paper is to describe an electronic laboratory information system implemented to alleviate these problems and its expanding use by the Peruvian public sector, as well as examine the broader issues of implementing such systems in resource-poor settings. Methods A web-based laboratory information system "e-Chasqui" has been designed and implemented in Peru to improve the timeliness and quality of laboratory data. It was deployed in the national TB laboratory, two regional laboratories and twelve pilot health centres. Using needs assessment and workflow analysis tools, e-Chasqui was designed to provide for improved patient care, increased quality control, and more efficient laboratory monitoring and reporting. Results Since its full implementation in March 2006, 29,944 smear microscopy, 31,797 culture and 7,675 drug susceptibility test results have been entered. Over 99% of these results have been viewed online by the health centres. High user satisfaction and heavy use have led to the expansion of e-Chasqui to additional institutions. In total, e-Chasqui will serve a network of institutions providing medical care for over 3.1 million people. The cost to maintain this system is approximately US$0.53 per sample or 1% of the National Peruvian TB program's 2006 budget. Conclusion Electronic laboratory information systems have a large potential to improve patient care and public health monitoring in resource-poor settings. Some of the challenges faced in these settings, such as lack of trained personnel, limited transportation, and large coverage areas, are obstacles that a well-designed system can overcome. e-Chasqui has the

  15. The abandoned surface mining sites in the Czech Republic: mapping and creating a database with a GIS web application

    Science.gov (United States)

    Pokorný, Richard; Tereza Peterková, Marie

    2016-05-01

    Based on the vectorization of the 55-volume book series the Quarry Inventories of the Czechoslovak Republic/Czechoslovak Socialist Republic, published in the years 1932-1961, a new comprehensive database was built comprising 9958 surface mining sites of raw materials, which were active in the first half of the 20th century. The mapped area covers 40.9 % of the territory of the Czech Republic. For the purposes of visualization, a map application, the Quarry Inventories Online, was created that enables the data visualization.

  16. Les usages linguistiques des adolescents québécois sur les médias sociaux Language use of Québec adolescents on the social web

    Directory of Open Access Journals (Sweden)

    Monique Lebrun

    2012-03-01

    Full Text Available L'article s'intéresse aux pratiques linguistiques sur les médias traditionnels et les médias sociaux déclarées par de jeunes Québécois de 14 à 17 ans, fréquentant l'école française. Le questionnaire d'enquête utilisé cible de façon particulière les usages du français et de l'anglais chez cinq sous-populations : les francophones de souche, les anglophones de souche, les allophones de souche, les francophones d'adoption et les anglophones d'adoption. Les résultats démontrent un usage très diversifié du français dans la vie quotidienne selon l'origine ethnique : les francophones de souche et d'adoption l'utilisent presque toujours à l'école, avec leur famille et leurs amis, et, à un degré important dans leur consommation des médias traditionnels, alors que les trois autres groupes le choisissent de façon sporadique, lui préférant l'anglais. Concernant les médias sociaux, les comportements s'alignent également sur les origines ethniques, cependant la prédominance du français chez les deux groupes les plus francophones s'estompe quelque peu, semblant concrétiser la prédominance de l'anglais sur la Grande Toile chez les jeunes Québécois. La conscience linguistique des jeunes Québécois, d'un point de vue général, même celle des deux groupes de francophones, n'est pas encore très éveillée, non plus que leur opinion face à la francophonie.The article compares the linguistic practices, in daily activities and on the social web, reported by adolescents between the ages of 14 and 17 years old attending French schools. The survey targeted particularly the use of French and English in five sub-populations: French mother tongue language speakers, English mother tongue language speakers, allophones, French second language speakers and English second language speakers. The results show a very diverse use of French in daily life depending on the ethnic origin, French mother tongue language speakers and French second

  17. Mining text data

    CERN Document Server

    Aggarwal, Charu C

    2012-01-01

    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. ""Mining Text Data"" introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including

  18. Study on Web Coal Mine Equipment Management System%研究分析Web的煤矿设备管理系统

    Institute of Scientific and Technical Information of China (English)

    宋向红; 孙琼琼

    2013-01-01

    企业的生产和长远发展离不开优秀的企业管理,而企业管理当中的重要环节就是设备管理。而为了保证或完善设备能完成相关功能,则需要通过维修等各种技术和管理手段,这也是设备管理的主要内容。为了应对市场日益加剧的行业竞争,随着信息技术的不断进步,我国各行业领域陆续构建了设备管理信息系统。为了实现煤矿设备维修资料的Web管理及信息资源的共享沟通,文章就煤矿设备管理系统的组建进行调研和讨论,从而来帮助管理人员进行维护。%Enterprise production and long -term development cannot leave the outstanding enterprise management, and enterprise management an important link of is equipment management. In order to guarantee or perfect equipment can complete the relevant function, need to through the maintenance, etc. Various kinds of technology and management, this also is the main content of the equipment management. In order to deal with the growing market competition in the industry, along with the information technology progress, our country in the field of industry construction equipment management information system. In order to realize the coal mine equipment maintenance material Web management and the sharing of the information resources communication, this paper established coal mine equipment management system for research and discussion, thus to help management maintenance.

  19. High Level of Integration in Integrated Disease Management Leads to Higher Usage in the e-Vita Study: Self-Management of Chronic Obstructive Pulmonary Disease With Web-Based Platforms in a Parallel Cohort Design.

    Science.gov (United States)

    Talboom-Kamp, Esther Pwa; Verdijk, Noortje A; Kasteleyn, Marise J; Harmans, Lara M; Talboom, Irvin Jsh; Numans, Mattijs E; Chavannes, Niels H

    2017-05-31

    Worldwide, nearly 3 million people die of chronic obstructive pulmonary disease (COPD) every year. Integrated disease management (IDM) improves disease-specific quality of life and exercise capacity for people with COPD, but can also reduce hospital admissions and hospital days. Self-management of COPD through eHealth interventions has shown to be an effective method to improve the quality and efficiency of IDM in several settings, but it remains unknown which factors influence usage of eHealth and change in behavior of patients. Our study, e-Vita COPD, compares different levels of integration of Web-based self-management platforms in IDM in three primary care settings. The main aim of this study is to analyze the factors that successfully promote the use of a self-management platform for COPD patients. The e-Vita COPD study compares three different approaches to incorporating eHealth via Web-based self-management platforms into IDM of COPD using a parallel cohort design. Three groups integrated the platforms to different levels. In groups 1 (high integration) and 2 (medium integration), randomization was performed to two levels of personal assistance for patients (high and low assistance); in group 3 there was no integration into disease management (none integration). Every visit to the e-Vita and Zorgdraad COPD Web platforms was tracked objectively by collecting log data (sessions and services). At the first log-in, patients completed a baseline questionnaire. Baseline characteristics were automatically extracted from the log files including age, gender, education level, scores on the Clinical COPD Questionnaire (CCQ), dyspnea scale (MRC), and quality of life questionnaire (EQ5D). To predict the use of the platforms, multiple linear regression analyses for the different independent variables were performed: integration in IDM (high, medium, none), personal assistance for the participants (high vs low), educational level, and self-efficacy level (General Self

  20. LIBP-Pred: web server for lipid binding proteins using structural network parameters; PDB mining of human cancer biomarkers and drug targets in parasites and bacteria.

    Science.gov (United States)

    González-Díaz, Humberto; Munteanu, Cristian R; Postelnicu, Lucian; Prado-Prado, Francisco; Gestal, Marcos; Pazos, Alejandro

    2012-03-01

    Lipid-Binding Proteins (LIBPs) or Fatty Acid-Binding Proteins (FABPs) play an important role in many diseases such as different types of cancer, kidney injury, atherosclerosis, diabetes, intestinal ischemia and parasitic infections. Thus, the computational methods that can predict LIBPs based on 3D structure parameters became a goal of major importance for drug-target discovery, vaccine design and biomarker selection. In addition, the Protein Data Bank (PDB) contains 3000+ protein 3D structures with unknown function. This list, as well as new experimental outcomes in proteomics research, is a very interesting source to discover relevant proteins, including LIBPs. However, to the best of our knowledge, there are no general models to predict new LIBPs based on 3D structures. We developed new Quantitative Structure-Activity Relationship (QSAR) models based on 3D electrostatic parameters of 1801 different proteins, including 801 LIBPs. We calculated these electrostatic parameters with the MARCH-INSIDE software and they correspond to the entire protein or to specific protein regions named core, inner, middle, and surface. We used these parameters as inputs to develop a simple Linear Discriminant Analysis (LDA) classifier to discriminate 3D structure of LIBPs from other proteins. We implemented this predictor in the web server named LIBP-Pred, freely available at , along with other important web servers of the Bio-AIMS portal. The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server. We demonstrated the PDB mining option performing a predictive study of 2000+ proteins with unknown function. Interesting results regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense.

  1. Assessment of Sea Area Usage Rights about Sea Sand Mining from the View of Sea Area Real Right%以海域物权视角探讨海砂开采海域使用权价格评估

    Institute of Scientific and Technical Information of China (English)

    胡灯进; 郭晓峰; 杨顺良

    2016-01-01

    在海砂开采海域使用权价格评估中,因对海域使用权与采矿权权利边界认识模糊,不同评估人员对采矿权价款采取不同的计算处理,导致评估结果相差甚大。文章从海域物权的视角分析海域使用权和采矿权的法律属性,研究两者权利标的物的不同法律性质,从而明确海域使用权的海域是由三维空间(水面、水体、海床和底土)、地貌、水深地形、地质条件、潮流、波浪、生态环境、景观等不可分割的固有自然条件要素组成的立体空间,本质上为海域空间资源,是海砂等其他海洋自然资源的载体;海域使用权和采矿权之间的相互独立性,决定了海域使用权价格与采矿权价款间的非包含关系;因此,采用收益法评估海砂开采海域使用权价格时,采矿权价款宜以成本列入计算。%Due to the inkling recognition of the connotation between the sea area usage right and the mining right,different evaluators had diverse treatments at mining right cost in assessing the sea area usage right about sea sand mining,which had led to a widely results.The paper analyzed the legal attribute of the sea area usage right and the mining right,studied the different legal char-acteristics of their subject-matters on the perspective of the Real Right of Sea Area.As a conclu-sion,based on the constitution of stereoscopic space by indivisible inherent natural elements,the sea area,which is made up of three-dimensional space (i.e.sea surface,water volume,seabed and subsoil),physiognomy,marine topography,geological conditions,tide,wave,ecotope,landscape etc.,is the sea area spatial resources in nature.It’ s the carrier of sea sand and other natural ma-rine resources.The exclusive relation of the value of the sea area usage right and the mining right cost is determined by the mutual independence between the sea area usage right and the mining right.Therefore,mining right cost should

  2. Web Mining Technology and Designing of the Tools%Web上的数据挖掘技术和工具设计

    Institute of Scientific and Technical Information of China (English)

    谢丹夏

    2001-01-01

    More and more commerce-related transactions are becoming digital. The more you know about your customers,the better you can serve them. Every customer action on a Web site generates data,not just high-level interactions such as buying something,but also something as simple as using a search engine or navigating through a site. All these interactions between digital service providers and the consumer can be recorded and stored in digital databases.These large data sets contain information helpful to business marketing strategies,both for retrospective analysis as well as data-driven forecasting. Web mining tools will provide companies with previously unknown statistics and useful insights into the behavior of their online customers via analyzing the data on the web.%电子商务网站的网上业务量巨大,在每天的大量业务中蕴涵了许多用户的隐藏信息。每个顾客在WEB上的行为都会产生相关数据,这不只是包括购买的信息,而且还有利用搜索引擎和在站点内浏览的相关数据。所有的交互数据都可以被网站后台的数据库记录下来,这些大量的数据集合包含了对历史记录的市场分析以及数据驱动的市场预测非常有益的潜在信息。利用完善的数据库技术,现在的公司能够比较容易地搜集到大量的客户信息。而通过把数据挖掘在W阴上的应用,即W衄ⅢMNG技术,公司可以利用有效的顾客信息,发掘潜在的市场,提高竞争力。

  3. Determinants and development of a web-based child mortality prediction model in resource-limited settings: A data mining approach.

    Science.gov (United States)

    Tesfaye, Brook; Atique, Suleman; Elias, Noah; Dibaba, Legesse; Shabbir, Syed-Abdul; Kebede, Mihiretu

    2017-03-01

    Improving child health and reducing child mortality rate are key health priorities in developing countries. This study aimed to identify determinant sand develop, a web-based child mortality prediction model in Ethiopian local language using classification data mining algorithm. Decision tree (using J48 algorithm) and rule induction (using PART algorithm) techniques were applied on 11,654 records of Ethiopian demographic and health survey data. Waikato Environment for Knowledge Analysis (WEKA) for windows version 3.6.8 was used to develop optimal models. 8157 (70%) records were randomly allocated to training group for model building while; the remaining 3496 (30%) records were allocated as the test group for model validation. The validation of the model was assessed using accuracy, sensitivity, specificity and area under Receiver Operating Characteristics (ROC) curve. Using Statistical Package for Social Sciences (SPSS) version 20.0; logistic regressions and Odds Ratio (OR) with 95% Confidence Interval (CI) was used to identify determinants of child mortality. The child mortality rate was 72 deaths per 1000 live births. Breast-feeding (AOR= 1.46, (95% CI [1.22. 1.75]), maternal education (AOR= 1.40, 95% CI [1.11, 1.81]), family planning (AOR= 1.21, [1.08, 1.43]), preceding birth interval (AOR= 4.90, [2.94, 8.15]), presence of diarrhea (AOR= 1.54, 95% CI [1.32, 1.66]), father's education (AOR= 1.4, 95% CI [1.04, 1.78]), low birth weight (AOR= 1.2, 95% CI [0.98, 1.51]) and, age of the mother at first birth (AOR= 1.42, [1.01-1.89]) were found to be determinants for child mortality. The J48 model had better performance, accuracy (94.3%), sensitivity (93.8%), specificity (94.3%), Positive Predictive Value (PPV) (92.2%), Negative Predictive Value (NPV) (94.5%) and, the area under ROC (94.8%). Subsequent to developing an optimal prediction model, we relied on this model to develop a web-based application system for child mortality prediction. In this study

  4. 基于Web挖掘的用户兴趣建模方法的研究%User interest modeling method based on Web mining research

    Institute of Scientific and Technical Information of China (English)

    浦慧忠

    2014-01-01

    Based on different user interest to study how for the user’s browsing behavior to obtain effective interest data users and deficiencies of the existing user interest model based on the presence, combined with Web mining technologies, the first explicit construction of user interest model, the hidden updating user interest model, enabling users to adapt to chang-ing user interest model of interest.%基于用户兴趣的不同,研究如何针对用户的浏览行为来获取用户的有效兴趣数据,并根据现有用户兴趣模型存在的不足,结合Web挖掘中的相关技术,先显式构建用户兴趣模型,后隐式更新用户兴趣模型,从而实现能适应用户兴趣变化的用户兴趣模型。

  5. Web User Categorization and Behavior Study Based on Refreshing

    Directory of Open Access Journals (Sweden)

    Ratnesh Kumar Jain

    2009-09-01

    Full Text Available As the information available on World Wide Web is growing the usage of the web sites is also growing. Since each access to the web pages are recorded in the web logs it is becoming a huge data repository which when mined properly can provide useful information for decision making. The designer of the web site, analyst and management executives are interested in extracting this hidden information from web logs for decision making. In this research paper we proposed a method to categorize the users into faithful, Partially Impatient and Completely Impatient user, page wise so that study of user behavior can be easier. To categorize the user we proposed one new information in the web log that represent each instance of refreshing. We used the markov chain model in which we treated the clicking of Refresh button as another state i.e. Refresh State. We derive some theorem to study each type of user behavior and show that how do users behavior differ.

  6. An evaluation on the Web page navigation tools in university library Web sites In Turkey

    OpenAIRE

    Çakmak, Tolga

    2010-01-01

    Web technologies and web pages are primary tools for dissemination of information all over the world today. Libraries are also using and adopting these technologies to reach their audiences. The effective usage of these technologies can be possible with user centered design. Web pages that have user centered design help users to find information without being lost in the web page. As a part of the web pages, navigation systems have a vital role in this context. Effective usage of navigation s...

  7. Open Peer Review in Scientific Publishing: A Web Mining Study of PeerJ Authors and Reviewers

    Directory of Open Access Journals (Sweden)

    Peiling Wang

    2016-11-01

    Full Text Available Purpose: To understand how authors and reviewers are accepting and embracing Open Peer Review (OPR, one of the newest innovations in the Open Science movement. Design/methodology/approach: This research collected and analyzed data from the Open Access journal PeerJ over its first three years (2013-2016. Web data were scraped, cleaned, and structured using several Web tools and programs. The structured data were imported into a relational database. Data analyses were conducted using analytical tools as well as programs developed by the researchers. Findings: PeerJ, which supports optional OPR, has a broad international representation of authors and referees. Approximately 73.89% of articles provide full review histories. Of the articles with published review histories, 17.61% had identities of all reviewers and 52.57% had at least one signed reviewer. In total, 43.23% of all reviews were signed. The observed proportions of signed reviews have been relatively stable over the period since the Journal's inception. Research limitations: This research is constrained by the availability of the peer review history data. Some peer reviews were not available when the authors opted out of publishing their review histories. The anonymity of reviewers made it impossible to give an accurate count of reviewers who contributed to the review process. Practical implications: These findings shed light on the current characteristics of OPR. Given the policy that authors are encouraged to make their articles' review history public and referees are encouraged to sign their review reports, the three years of PeerJ review data demonstrate that there is still some reluctance by authors to make their reviews public and by reviewers to identify themselves. Originality/value: This is the first study to closely examine PeerJ as an example of an OPR model journal. As Open Science moves further towards open research, OPR is a final and critical component. Research in this

  8. Identifying web usage behavior of bank customers

    Science.gov (United States)

    Araya, Sandro; Silva, Mariano; Weber, Richard

    2002-03-01

    The bank Banco Credito e Inversiones (BCI) started its virtual bank in 1996 and its registered customers perform currently more than 10,000 Internet transactions daily, which typically cause les than 10% of traditional transaction costs. Since most of the customers are still not registered for online banking, one of the goals of the virtual bank is to increase then umber of registered customers. Objective of the presented work was to identify customers who are likely to perform online banking but still do not use this medium for their transactions. This objective has been reached by determining profiles of registered customers who perform many transactions online. Based on these profiles the bank's Data Warehouse is explored for twins of these heavy users that are still not registered for online banking. We applied clustering in order to group the registered customers into five classes. One of these classes contained almost 30% of all registered customers and could clearly be identified as class of heavy users. Next a neural network assigned online customers to the previously found five classes. Applying the network trained on online customers to all the bank customers identified twins of heavy users that, however had not performed online transactions so far. A mailing to these candidates informing about the advantages of online banking doubled the number of registrations compared to previous campaigns.

  9. Web日志挖掘中的用户识别算法%User Identification Algorithm in Web Log Mining

    Institute of Scientific and Technical Information of China (English)

    肖慧; 王立华

    2011-01-01

    The paper introduces some existing user identification algorithms, proposes IASR (IP, Agent, Session and Referrer) user identification algorithm to solve existing problems on user identification.The proposed algorithm overwrite URL in order to track users, efficiently and accurately identifies different users accessing the same proxy, and satisfactorily solves “Multi-User Problem” due to accessing Web via directly inputting URL in browser's address bar.At last, the paper prospects future development of user identification algorithm.%介绍了现有的用户识别算法,针对用户识别目前存在的问题提出了IASR(IP,Agent,Session and Referrer)用户识别算法.该算法采用重写URL的用户跟踪技术,引入会话(Session)来识别用户,能够高效准确地识别访问同一代理服务器的不同用户,很好地解决同一用户直接从浏览器地址输入URL信息访问站点造成的"多用户问题".最后,对用户识别算法的发展趋势进行了展望.

  10. Web Classification Using DYN FP Algorithm

    Directory of Open Access Journals (Sweden)

    Bhanu Pratap Singh

    2014-01-01

    Full Text Available Web mining is the application of data mining techniques to extract knowledge from Web. Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that includes Web Search, Classification and Personalization etc. The primary goal of the web site is to provide the relevant information to the users. Web mining technique is used to categorize users and pages by analyzing users behavior, the content of pages and order of URLs accessed. In this paper, proposes an auto-classification algorithm of web pages using data mining techniques. The problem of discovering association rules between terms in a set of web pages belonging to a category in a search engine database, and present an auto – classification algorithm for solving this problem that are fundamentally based on FP-growth algorithm

  11. Business Education Students' Perception of Educational Usage of ...

    African Journals Online (AJOL)

    Business Education Students' Perception of Educational Usage of Social Networking Sites in Tertiary Institutions in Anambra State. ... The PDF file you selected should load here if your Web browser has a PDF reader plug-in installed (for ...

  12. Quantitative Literacy on the Web of Science, 2 – Mining the Health Numeracy Literature for Assessment Items

    Directory of Open Access Journals (Sweden)

    H.L. Vacher

    2009-01-01

    Full Text Available A topic search of the Web of Science (WoS database using the term “numeracy” produced a bibliography of 293 articles, reviews and editorial commentaries (Oct 2008. The citation graph of the bibliography clearly identifies five benchmark papers (1995-2001, four of which developed numeracy assessment instruments. Starting with the 80 papers that cite these benchmarks, we identified a set of 25 papers (1995-2008 in which the medical research community reports the development and/or application of health-numeracy assessments. In all we found 10 assessment instruments from which we have compiled a total of 48 assessment items. There are both general and context-specific tests, with the wide range in the latter illustrated by names such as the Diabetes Numeracy Test and the Asthma Numeracy Questionnaire. There is also a Medical Data Interpretation Test and a Subjective Numeracy Scale. Much of this literature discusses the validity and reliability of the test, and many papers include item-by-item results of the tests from when they were applied in the research reported in the papers. The research that used the tests was directed at exploring such subjects as the patients’ ability to evaluate risks and benefits in order to make informed decisions; to understand and carry out instructions in order to self-manage their medical conditions; and, in research settings, to understand what the researchers were asking in their assessments (e.g., quantified quality of life that require comparison of numerical information. We present the collection of items as a potential resource for educators interested in numeracy assessments in context.

  13. Google Scholar Usage: An Academic Library's Experience

    Science.gov (United States)

    Wang, Ya; Howard, Pamela

    2012-01-01

    Google Scholar is a free service that provides a simple way to broadly search for scholarly works and to connect patrons with the resources libraries provide. The researchers in this study analyzed Google Scholar usage data from 2006 for three library tools at San Francisco State University: SFX link resolver, Web Access Management proxy server,…

  14. Google Scholar Usage: An Academic Library's Experience

    Science.gov (United States)

    Wang, Ya; Howard, Pamela

    2012-01-01

    Google Scholar is a free service that provides a simple way to broadly search for scholarly works and to connect patrons with the resources libraries provide. The researchers in this study analyzed Google Scholar usage data from 2006 for three library tools at San Francisco State University: SFX link resolver, Web Access Management proxy server,…

  15. Concept and Establishment of the Mine Information System within the CROMAC GIP Project

    Directory of Open Access Journals (Sweden)

    Zvonko Biljecki

    2006-12-01

    Full Text Available In order to solve mine problems in the Republic of Croatia, a unique project CROMAC GIP (Croatian Mine Action Centre Geoinformation Project has been initiated significantly increasing the functional quality of the existing Mine Information System (MIS. Since mine problems are closely related to space, geodata are a crucial part of MIS intended for monitoring and planning of demining. Since the moment the Croatian Mine Action Centre was funded till today, the process of demining has progressed. The implementation of a topographic database in accordance with the CROTIS data model and the usage of orthophoto data produced according to the official product specifications can be pointed out in that progress. Usage of such geodata requires a sophisticated information system that enables a simultaneous usage of geodata and other data connected with solving mine problems. In order to reach all goals in demining and to use all advantages of geodata, it was indispensable to upgrade the existing Mine Information System by merging geodata and HCR data and to collect new data according to the standardized procedures, but controlling at the same time the quality and automated procedures of uploading into the system. Apart from being constructed in accordance with the Standard Operative Procedures (SOP, the modernised MIS is also based on generally accepted standards in the field of geoinformation and it is implemented on advanced technology. The core of the system is the Oracle database, and GeoMedia is a WebMap Professional tool on the basis of which the distribution and the work with spatial data is possible on intranet/Internet. In order to achieve full efficiency of the system, it is necessary to provide high quality and updated geodata. In this respect, photogrammetric data are the most efficient solution.

  16. Research on Enterprise Web Log Mining Based on Improved Apriori Algorithm%基于Apriori改进算法的企业Web日志挖掘研究

    Institute of Scientific and Technical Information of China (English)

    吴红星; 王浩

    2015-01-01

    A large number of valuable information is hidden in the enterprise Web log,the disadvantage of Apriori algorithm is to produce a large number of candidate set and frequent scan data set. In this paper,study based on Web log information from collaborative Web por-tal. The enterprises collaborative Web portal can release the relevant notice of enterprise information at the announcements column at any time,which is what the enterprise want visitors to see at the first time. The Website news is to show visitors for enterprise related news, information and enterprise management activities,it’ s also to complete the enterprise brand and enterprise culture propaganda,etc. Based on the general character of collaborative Web portal,present an improved Apriori algorithm for enterprises,the enterprises show visitors announcements or business news and information actively,dig out the status of the other main column in visitors,and the degree of these columns’ attention and interest in visitors. In this way,the enterprises can adjust the other column layout,do better service for enterprise propaganda,and meet the visitors’ convenient access, etc. The core of the improved algorithm is to reduce the candidate set. In the process of scanning of Apriori algorithm,an ID is not to participate in,when the algorithm mining the maximum frequent sets and then adding the ID to the maximum frequent item sets concentration of each item,to carry out the association rules mining. There is a larger degree of optimization in the number of data sets of scanning and candidate set generation. After the contrast experiments,it shows that the improved Apriori algorithm is effective and has the strong practical application value for enterprises.%由于企业的Web日志中隐藏着大量有价值的信息,Apriori算法的缺点在于产生大量的候选集以及频繁扫描数据集,文中是基于协同门户和网站的日志信息进行研究。企业的协同门户里企业通知栏目可

  17. On-line Generation of Suggestions for Web Users

    OpenAIRE

    2004-01-01

    One important class of Data Mining applications is the so-called "Web Mining" that analyzes and extracts important and non-trivial knowledge from Web related data. Typical applications of Web Mining are represented by the personalization or recommender systems.These systems are aimed to extract knowledge from the analysis of historical information of a web server in order to improve the web site expressiveness in terms of readability and content availability. Typically, these systems are made...

  18. 基于网络挖掘的上下文相关词汇级复述研究%Context-Dependent Lexical Paraphrasing Based on Web Mining

    Institute of Scientific and Technical Information of China (English)

    赵世奇; 张宇; 赵琳; 刘挺; 李生

    2009-01-01

    Lexical paraphrasing is the task of extracting word-level paraphrases. Lexical paraphrases should be context dependent since a word may have different paraphrases in distinct contexts. This paper investigates a framework for acquiring context-dependent lexical paraphrases, in which a web mining method is developed for extracting candidate paraphrases and a classification method is introduced in paraphrase validation. Evaluations are carried out on the People's Daily corpus and the results show that: (1) the web mining method performs well in candidate paraphrase extraction, which extracts 2.3 correct paraphrases on average for each test word in each given context sentence; (2) the classifier for paraphrase validation is effective, which achieves an f-measure of 0.6023;(3) 75.11% and 98.31% of the paraphrases extracted by our method cannot be recognized by the two widely used context-independent methods, i.e., the thesaurus-based and clustering-based methods respectively. This indicates that the presented context-dependent method is a considerable supplement to the context-independent ones.%词汇级复述研究旨在为词汇获取复述.词汇级复述是上下文相关的,即对同一个词在不同上下文中应获取不同的复述词.提出了一种获取上下文相关词汇级复述的方法.该方法包括两部分:基于网络挖掘的候选复述词获取以及基于二元分类的复述词确认.在语料库上的实验结果表明:(1) 基于网络挖掘的候选复述词获取方法是切实可行的,平均为每个待复述词在每个给定的上下文句子中获取2.3个正确复述词;(2) 利用二元分类的方法进行复述确认是有效的,其F值达到0.6023;(3) 利用该方法抽取得到的复述中,有75.11%和98.31%无法通过两种常用的上下文无关方法,即基于辞典和基于聚类的方法来获得.这证明了所提出的上下文相关复述方法可以有效地补充传统的上下文无关方法.

  19. 基于Web数据挖掘的个性化搜索引擎的应用和发展趋势%Application and Development Trend of Personalized Search Engine Based on Web Data Mining

    Institute of Scientific and Technical Information of China (English)

    王丽; 曹家琏

    2009-01-01

    Web数据挖掘是将数据挖掘技术和理论应用于对www资源进行挖掘的一个新兴的研究领域.论述Web数据挖掘的发展现状、发展趋势以及将来可能的研究方向.并简单介绍个性化搜索引擎的一些情况,论述Web数据挖掘在个性化搜索引擎中的应用.%Web data mining is a new developing research field in which data mining technology and application of the theory apply to be-ing excavated to www resources. Describe current situation of the development, development trend and possible research direction in thefuture that Web data excavate, and introduce some situations of the individualized search engine briefly, Elaborates the application of Webdata mining in the individualized search engine.

  20. 数据挖掘中Web文档转换算法的设计与实现%DESIGN AND IMPLEMENTATION OF WEB DOCUMENTS CONVERSION ALGORITHM IN DATA MINING

    Institute of Scientific and Technical Information of China (English)

    赵小龙; 佘东

    2011-01-01

    Web文本挖掘是数据挖掘技术在网络信息处理中的一个重要应用。如何将web文档转换成数据挖掘所要求的格式,即web文档预处理是一项很重要的研究课题.本文的方法是:从Internet网上下载了大量的网页文件,将网页文件转换成文本文件,然后通过算法对这些文本文件中的数据进行词频统计,删除非用词,去掉高频词,对单词进行词根处理,建立用词词表,从而抽取用词,按字母排序生成词频索引,和字典文件进行对照,获取单词的ID,最后生成Reuters一21578的Database数据格式.这样就将web文档数据转换成标准的数据集.以便为数据挖掘中分类、聚类作好准备.%Web text information mining is one of the important applications of applying data mining technologies into informa- tion analysis and processing, how to transform web documents into data mining to the required format, i.e. web document pre- processing becomes a significant research task. In this paper the method is : from Internet to download a large number of web- page files, webpage files are converted into a text files, and then through the algorithm to word frequency statistics the data of the text files, delete non-using words, remove high frequency words, process etyma of substantive words, extract stems, elimi- nate redundant words and establish word lis4 thus extraction word list, alphabetical index to generate word frequency index, and the dictionary file comparison, get the word ID, the last generation of Reuters-21578 Database data format. This web docu ment data converted into standard data sets for classification and clustering to prepare in data mining.

  1. AN ENHANCED PRE-PROCESSING RESEARCH FRAMEWORK FOR WEB LOG DATA USING A LEARNING ALGORITHM

    Directory of Open Access Journals (Sweden)

    V.V.R. Maheswara Rao

    2011-01-01

    Full Text Available With the continued growth and proliferation of Web services and Web based information systems, the volumes of user data have reached astronomical proportions. Before analyzing such data using web mining techniques, the web log has to be pre processed, integrated and transformed. As the World Wide Web is continuously and rapidly growing, it is necessary for the web miners to utilize intelligent tools in order to find, extract, filter and evaluate the desired information. The data pre-processing stage is the most important phase for investigation of the web user usage behaviour. To do this one must extract the only human user accesses from weblog data which is critical and complex. The web log is incremental in nature, thus conventional data pre-processing techniques were proved to be not suitable. Hence an extensive learning algorithm is required in order to get the desired information.This paper introduces an extensive research frame work capable of pre processing web log data completely and efficiently. The learning algorithm of proposed research frame work can separates human user and search engine accesses intelligently, with less time. In order to create suitable target data, the further essential tasks of pre-processing Data Cleansing, User Identification, Sessionization and Path Completion are designed collectively. The framework reduces the error rate and improves significant learning performance of the algorithm. The work ensures the goodness of split by using popular measures like Entropy and Gini index. This framework helps to investigate the web user usage behaviour efficiently. The experimental results proving this claim are given in this paper.

  2. Research on risk web information mining technology based on improved association rules%基于改进关联规则的危险Web信息挖掘技术研究

    Institute of Scientific and Technical Information of China (English)

    黄宏本

    2016-01-01

    The security of cyber information space is threatened by the hazard information that caused by different protocols and network channels in Web network,and the cyber space is purified to ensure the network security by mining the hazard Web information accurately. The algorithm of the fuzzy association rules are used in the traditional method to excavate and classified the dangerous Web information. The fuzzy clustering is easy to be disturbed in the influence background and has low efficiency, so it is hard to establish effective association rules. Because of this,the risk Web information mining technology based on the im⁃proved association rules is proposed. Before establishing the association rules,Takens theorem is introduced to reconstruct the phase space of the hazard Web information data to establish the channel model for the hazard information mining in Web net⁃work and make classification design for the multisource progress of the risk Web information flow. An adaptive IIR cascade filtering algorithm is designed to filter the data influence,improve the progress of the association rules,and realize the accurate mining of the risk Web information. The simulation results for the performance verification show that this algorithm has advantages of good filtering interference performance and high accuracy.%在Web网络中承载着不同的协议和网络信道,由此产生危险信息,给网络信息空间带来安全威胁,通过对危险Web信息的准确挖掘,可净化网络空间,确保网络安全。传统方法采用模糊关联规则算法进行危险Web信息分类挖掘,在干扰背景下,模糊聚类过容易受到干扰,导致很难建立有效的关联规则,挖掘效率较低。提出一种基于改进关联规则的危险Web信息挖掘技术。在建立关联规则前,引入Takens 定理进行危险Web信息数据的相空间重构,构建Web网络的危险信息挖掘的信道模型,并对危险Web信息的信息流多

  3. Mobile response in web panels

    NARCIS (Netherlands)

    de Bruijne, M.A.; Wijnant, A.

    2014-01-01

    This article investigates unintended mobile access to surveys in online, probability-based panels. We find that spontaneous tablet usage is drastically increasing in web surveys, while smartphone usage remains low. Further, we analyze the bias of respondent profiles using smartphones and tablets com

  4. STUDY AND IMPROVEMENT ON LINKAGE SIMILARITY-BASED WEB MINING ALGORITHM%基于链接相似度Web挖掘算法的研究与改进

    Institute of Scientific and Technical Information of China (English)

    杨益凡; 朱明; 李华虎

    2011-01-01

    On the basis of Web mining classification pattern, a Web structure mining algorithm HITS based on linked-analysis is studied and analyzed in this paper. An improved DS-HITS algorithm is proposed in light of the shortcomings of HITS Algorithm which only considers the linked into and out of web pages based on root sets but does not consider the similarities of linked into and out of web pages in the acquiring course of expanded sets processing. Many kinds of weights reflecting the pages' similarities are introduced in this improved algorithm in the course of expanded sets processing, so that the core values and authorities of the acquired pages are to be improved significantly. Finally,the searching results of DS-HITS and HITS algorithm are compared based on the initial data of Webla's open source project.%在Web挖掘分类模式基础上,研究和分析了基于链接分析的Web结构挖掘算法HITS(Hyperlink induced topic Search).针对HITS算法在获取拓展集处理过程中只考虑基于根集网页链接出、入网页,不考虑出、入网页相似度的不足之处,提出了一种改进的DS-HITS(Document Similarity hyperlink induced topic search)算法.该算法在拓展集处理过程中引进多种反映网页相似度的权值,从而使获取的网页在核心和权威值方面明显得到改进.最后,基于Webla开源项目初始数据,对比了DS-HITS算法和HITS算法的搜索结果.

  5. 一种新的用于数据挖掘工具的网页净化算法%An new algorithm of Web page purification for data mining tools

    Institute of Scientific and Technical Information of China (English)

    孙楠; 张华伟

    2011-01-01

    In order to eliminate noise preferably and extract topic content from Web pages efficiently,an algorithm of Web page purification is presented. This algorithm argues that topic content of Web page is mainly contained in and ,hereby Web noise can be preprocessed. Then with the content match of relevant Web page, the topic content of Web page can be acquired by way of calculating the importance of node. This algorithm has achieved very precise results, correctly extracting 98.2% of the pages in a set of 6 318 pages in portal sites. When used for data mining tools, this algorithm is better than the other similar algorithms. It can eliminate noise efficiently.%为了更好地消除网页噪声,有效地提取网页的主题内容,提出了一种新的网页净化算法.该算法认为网页的主题内容主要包含在< table>标记和<p>标记里面,并据此对网页噪声进行预处理,然后与相关网页进行内容匹配,通过计算节点重要度,获取网页的主题内容.对门户网站的6318个网页的检测表明,该算法可以有效地提取网页的主题内容,准确率达到98.2%以上.用于数据挖掘工具时,该算法优于其他同类算法,可以有效地去除网页噪声.

  6. Mine or Theirs, Where Do Users Go? A Comparison of E-Journal Usage at the OhioLINK Electronic Journal Center Platform versus the Elsevier ScienceDirect Platform

    Science.gov (United States)

    Swanson, Juleah

    2015-01-01

    This research provides librarians with a model for assessing and predicting which platforms patrons will use to access the same content, specifically comparing usage at the Ohio Library and Information Network (OhioLINK) Electronic Journal Center (EJC) and at Elsevier's ScienceDirect from 2007 to 2013. Findings show that in the earlier years, the…

  7. Mine or Theirs, Where Do Users Go? A Comparison of E-Journal Usage at the OhioLINK Electronic Journal Center Platform versus the Elsevier ScienceDirect Platform

    Science.gov (United States)

    Swanson, Juleah

    2015-01-01

    This research provides librarians with a model for assessing and predicting which platforms patrons will use to access the same content, specifically comparing usage at the Ohio Library and Information Network (OhioLINK) Electronic Journal Center (EJC) and at Elsevier's ScienceDirect from 2007 to 2013. Findings show that in the earlier years, the…

  8. Usage Record Format Recommendation

    CERN Document Server

    Nilsen, J.K.; Muller-Pfeerkorn, R

    2013-01-01

    For resources to be shared, sites must be able to exchange basic accounting and usage data in a common format. This document describes a common format which enables the exchange of basic accounting and usage data from different resources. This record format is intended to facilitate the sharing of usage information, particularly in the area of the accounting of jobs, computing, memory, storage and cloud usage but with a structure that allows an easy extension to other resources. This document describes the Usage Record components both in natural language form and annotated XML. This document does not address how these records should be used, nor does it attempt to dictate the format in which the accounting records are stored. Instead, it denes a common exchange format. Furthermore, nothing is said regarding the communication mechanisms employed to exchange the records, i.e. transport layer, framing, authentication, integrity, etc.

  9. The use of web based monitoring and analysis-based platforms for the monitoring of slopes in opencast mines and quarries; Die Anwendung Web-basierter Monitoring- und Analyse-Plattformen fuer die Ueberwachung von Boeschungen in Steinbruechen und Tagebauen

    Energy Technology Data Exchange (ETDEWEB)

    Graf, Thomas; Fyfe, Timothy D. [Fugro Consult GmbH, Berlin (Germany)

    2010-07-15

    These days, ensuring the technical and operational safety requirements, is one of the core activities in the operation of quarries and open pit mines. Especially, the geotechnical stability of slopes during open pit operations contains a considerable risk potential. (orig.)

  10. Do usage and scientific collaboration associate with citation impact

    Energy Technology Data Exchange (ETDEWEB)

    Chi, P.S.; Glänzel, W.

    2016-07-01

    In this study usage counts and times cited from Web of Science Core Collection (WoS) were collected for each article published in 2013 with Belgian, Israeli and Iranian addresses. We investigate the relations among three indicators related to citation impact, usage counts coauthorship, respectively. In addition, we apply the method of Characteristic Scores and Scal (CSS) to analyse the distributions of citations and usage counts. The results show that citations and usage counts in WoS correlate to each other significantly, especially in the social sciences. However, the increase of the number of co-authors does not increase usage counts or citations significantly. Furthermore, the stability of CSS-class distributions proves the availability of CSS in characterising both usage and citation distributions. (Author)

  11. Modeling and clustering users with evolving profiles in usage streams

    KAUST Repository

    Zhang, Chongsheng

    2012-09-01

    Today, there is an increasing need of data stream mining technology to discover important patterns on the fly. Existing data stream models and algorithms commonly assume that users\\' records or profiles in data streams will not be updated or revised once they arrive. Nevertheless, in various applications such asWeb usage, the records/profiles of the users can evolve along time. This kind of streaming data evolves in two forms, the streaming of tuples or transactions as in the case of traditional data streams, and more importantly, the evolving of user records/profiles inside the streams. Such data streams bring difficulties on modeling and clustering for exploring users\\' behaviors. In this paper, we propose three models to summarize this kind of data streams, which are the batch model, the Evolving Objects (EO) model and the Dynamic Data Stream (DDS) model. Through creating, updating and deleting user profiles, these models summarize the behaviors of each user as a profile object. Based upon these models, clustering algorithms are employed to discover interesting user groups from the profile objects. We have evaluated all the proposed models on a large real-world data set, showing that the DDS model summarizes the data streams with evolving tuples more efficiently and effectively, and provides better basis for clustering users than the other two models. © 2012 IEEE.

  12. A node linkage approach for sequential pattern mining.

    Directory of Open Access Journals (Sweden)

    Osvaldo Navarro

    Full Text Available Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT, has better performance and scalability in comparison with state of the art algorithms.

  13. IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding

    NARCIS (Netherlands)

    Tan, H.; Dillon, T.S.; Hadzic, F.; Chang, E.; Feng, L.

    Tree mining has recently attracted a lot of interest in areas such as Bioinformatics, XML mining, Web mining, etc. We are mainly concerned with mining frequent induced and embedded subtrees. While more interesting patterns can be obtained when mining embedded subtrees, unfortunately mining such

  14. LHCb Computing Resource usage in 2015 (II)

    CERN Document Server

    Bozzi, Concezio

    2016-01-01

    This documents reports the usage of computing resources by the LHCb collaboration during the period January 1st – December 31st 2015. The data in the following sections has been compiled from the EGI Accounting portal: https://accounting.egi.eu. For LHCb specific information, the data is taken from the DIRAC Accounting at the LHCb DIRAC Web portal: http://lhcb-portal-dirac.cern.ch.

  15. An agent -based Intelligent System to enhance E-Learning through Mining Techniques

    Directory of Open Access Journals (Sweden)

    S.Prakasam

    2010-05-01

    Full Text Available The growth of Internet has created new ways for education systems. Learners and teachers realize their pedagogic activities with less effort, time and money. Agent Based Intelligent System (ABIS have proved their worth in multiple ways and in multiple domains in Education. In this paper the application of an agent-based IntelligentSystem for enhancing E-learning is introduced. An ABIS is a system that provides direct customized instruction or feedback to studentswithout the intervention of human beings. With the explosion of content on the World Wide Web (WWW, the scope of application of Data and Web Mining to E- Learning applications has increased tremendously. In this work, we identify a set of applications which go one step ahead from ABIS and use the WWW to aid the learning process of the “learning object content”. Each application hasa high level of coupling with the knowledge representation model, which models the resources stored in the Digital Library.This research presents the architecture for simplifying and automating the process of creating the domain model for an intelligent e-learningsystem. In this work we present an Knowledge representation of educational resources, using the World Wide Web. The effective and accurate intelligent systems based on the mining technologies have become the most important research issue. An agent-based recommendation system helps communities of learners in searchingthe web for information. This paper reports on the conceptualstructure that has evolved to define the development process for the pedagogical agents. It also proposes getting rich source of Information,such as the hyperlinks among pages or the Web usage information using web data mining technology Based on intelligent search systems and here we propose an architecture model of e- mLearning process using the agent paradigm.

  16. 一种Web 2.0环境下互联网热点挖掘算法%Mining Hot Topics on Internet under Web 2.0

    Institute of Scientific and Technical Information of China (English)

    李东方; 俞能海; 尹华罡

    2010-01-01

    利用Web 2.0下用户丰富的反馈信息进行互联网热点挖掘具有重要的应用价值.该文将Web 2.0 下用户在互联网上的信息活动看作为热度活动,并利用热量传递模型对其建模,然后基于该模型提出适用于Web 2.0环境下的话题抽取与热度评价算法.实验结果表明热量传递算法有效地利用了用户反馈信息,适用于Web 2.0下互联网环境.

  17. Differences in smartphone usage

    DEFF Research Database (Denmark)

    Gustarini, Mattia; Scipioni, Marcello Paolo; Fanourakis, Marios

    2016-01-01

    We analyze the users’ intimacy to investigate the differences in smartphone usage, considering the user’s location and number and kind of people physically around the user. With a first user study we (1) validate the intimacy concept, (2) evaluate its correlation to smartphone usage features and (3......-time features are predictive for the intimacy, and other smartphone-based features can improve the intimacy prediction accuracy....

  18. Social Web mining and exploitation for serious applications: Technosocial Predictive Analytics and related technologies for public health, environmental and national security surveillance

    Energy Technology Data Exchange (ETDEWEB)

    Kamel Boulos, Maged; Sanfilippo, Antonio P.; Corley, Courtney D.; Wheeler, Steve

    2010-03-17

    This paper explores techno-social predictive analytics (TPA) and related methods for Web “data mining” where users’ posts and queries are garnered from Social Web (“Web 2.0”) tools such as blogs, microblogging and social networking sites to form coherent representations of real-time health events. The paper includes a brief introduction to commonly used Social Web tools such as mashups and aggregators, and maps their exponential growth as an open architecture of participation for the masses and an emerging way to gain insight about people’s collective health status of whole populations. Several health related tool examples are described and demonstrated as practical means through which health professionals might create clear location specific pictures of epidemiological data such as flu outbreaks.

  19. Web数据挖掘技术在远程教育中的应用%Application of Web Data Mining Technology in Distance Education

    Institute of Scientific and Technical Information of China (English)

    刘婷; 胡玉娟; 孟庆伟

    2012-01-01

    数据挖掘技术为针对学习者个性差异提供差异性教学安排提供技术支持.从数据挖掘技术的概念入手,分析数据挖掘技术在远程教育的常用方法,初步探讨现代远程教育中Web数据挖掘技术的应用问题.%Data mining technology provides technical support for the difference teaching arrangement to learners' individual differences. Starting from the concept of data mining technology, this paper analyzes the common method of data mining technology in distance education, and discusses the problems in the application.

  20. Service mining framework and application

    CERN Document Server

    Chang, Wei-Lun

    2014-01-01

    The shifting focus of service from the 1980s to 2000s has proved that IT not only lowers the cost of service but creates avenues to enhance and increase revenue through service. The new type of service, e-service, is mobile, flexible, interactive, and interchangeable. While service science provides an avenue for future service researches, the specific research areas from the IT perspective still need to be elaborated. This book introduces a novel concept-service mining-to address several research areas from technology, model, management, and application perspectives. Service mining is defined as "a systematical process including service discovery, service experience, service recovery, and service retention to discover unique patterns and exceptional values within the existing services." The goal of service mining is similar to data mining, text mining, or web mining, and aims to "detect something new" from the service pool. The major difference is the feature of service is quite distinct from the mining targe...

  1. D\\'emarche d'\\'evaluation de l'usage et des r\\'epercussions psychosociales d'un environnement STIC sur une population de personnes \\^ag\\'ees en r\\'esidence m\\'edicalis\\'ee

    CERN Document Server

    Michel, Christine; Cohen-Montandreau, Véronique; Tarpin-Bernard, Franck

    2009-01-01

    The MNESIS Project aims to see whether the use of computerized environment by elderly people in medicalized residences stimulates their cognitive capacities and contributes to a better integration, recognition or acceptance within their social environment (friends, family, medical staff). In this paper we present the protocol of evaluation that is defined to check this assumption. This protocol is between users' centred traditional protocols (built on investigations and indirect observation) and studies of Web Usage Mining (where knowledge databases about the uses are built from traces of use). It allows collecting direct and indirect information on a large scale and over long periods.

  2. A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration

    OpenAIRE

    Kabisch, Thomas; Dragut, Eduard; Yu, Clement; Leser, Ulf

    2009-01-01

    Much data in the Web is hidden behind Web query interfaces. In most cases the only means to "surface" the content of a Web database is by formulating complex queries on such interfaces. Applications such as Deep Web crawling and Web database integration require an automatic usage of these interfaces. Therefore, an important problem to be addressed is the automatic extraction of query interfaces into an appropriate model. We hypothesize the existence of a set of domain-independent "commonsense...

  3. PubstractHelper: A Web-based Text-Mining Tool for Marking Sentences in Abstracts from PubMed Using Multiple User-Defined Keywords.

    Science.gov (United States)

    Chen, Chou-Cheng; Ho, Chung-Liang

    2014-01-01

    While a huge amount of information about biological literature can be obtained by searching the PubMed database, reading through all the titles and abstracts resulting from such a search for useful information is inefficient. Text mining makes it possible to increase this efficiency. Some websites use text mining to gather information from the PubMed database; however, they are database-oriented, using pre-defined search keywords while lacking a query interface for user-defined search inputs. We present the PubMed Abstract Reading Helper (PubstractHelper) website which combines text mining and reading assistance for an efficient PubMed search. PubstractHelper can accept a maximum of ten groups of keywords, within each group containing up to ten keywords. The principle behind the text-mining function of PubstractHelper is that keywords contained in the same sentence are likely to be related. PubstractHelper highlights sentences with co-occurring keywords in different colors. The user can download the PMID and the abstracts with color markings to be reviewed later. The PubstractHelper website can help users to identify relevant publications based on the presence of related keywords, which should be a handy tool for their research. http://bio.yungyun.com.tw/ATM/PubstractHelper.aspx and http://holab.med.ncku.edu.tw/ATM/PubstractHelper.aspx.

  4. E-learning use patterns in the workplace – Web logs from interaction with a web based lecture

    Directory of Open Access Journals (Sweden)

    Christian Ostlund

    2012-11-01

    Full Text Available When designing for e-learning the objective is to design for learning i.e. the technology supporting the learning activity should aid and support the learning process and be an arena where learning is likely to occur. To obtain this when designing e-learning for the workplace the author argue that it is important to have knowledge on how users actually access and use e-learning systems. In order to gain this knowledge web logs from a web lecture developed for a Scandinavian public body has been analyzed. During a period of two and a half months 15 learners visited the web lecture 74 times. The web lecture consisted of streaming video with exercises and additional links to resources on the WWW to provide an opportunity to investigate the topic from multiple perspectives. The web lecture took approximately one hour to finish. Using web usage mining for the analysis seven groups or interaction patterns emerged: peaking, one go, partial order, partial unordered, single module, mixed modules, non-video modules. Furthermore the web logs paint a picture of the learning activities being interrupted. This suggests that modules needs to be fine-grained (e.g. less than 8 minutes per video clip so learners’ do not need to waste time having to watch parts of a video clip while waiting for the part of interest to appear or having to fast forward. A clear and logical structure is also important to help the learner find their way back accurately and fast.

  5. Sentiment Analysis and Opinion Mining

    CERN Document Server

    Liu, Bing

    2012-01-01

    Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions

  6. Managing and Mining Graph Data

    CERN Document Server

    Aggarwal, Charu C

    2010-01-01

    Managing and Mining Graph Data is a comprehensive survey book in graph management and mining. It contains extensive surveys on a variety of important graph topics such as graph languages, indexing, clustering, data generation, pattern mining, classification, keyword search, pattern matching, and privacy. It also studies a number of domain-specific scenarios such as stream mining, web graphs, social networks, chemical and biological data. The chapters are written by well known researchers in the field, and provide a broad perspective of the area. This is the first comprehensive survey book in t

  7. Search Engine Design and Implementation of Web Data Mining Based on Concept Clustering%基于概念聚类的Web数据挖掘搜索引擎的设计与实现

    Institute of Scientific and Technical Information of China (English)

    刘典型; 刘完芳; 钟钢

    2015-01-01

    For the search of the web data mining,its accuracy depends to the numbers of keywords that user has inputted very much,as well as the agreement of user’s intent and the semantic analysis by search engine,the analysis of the search engine including clustering method based on link and based on concept.In this paper,to overcome the defect of the clustering method based on link,through using the method of clustering based on concept,starting from the concept and the storage method of bipartite graph,designed and implemented a personalized search engine of web data mining,its superiority is verified.%针对Web数据挖掘的搜索过程,其准确度很大程度取决于用户输入的关键词的数量,以及搜索引擎对关键词的语义的解析与用户原意的吻合度,而搜索引擎对关键词的解析,包括基于链接的聚类方法和基于概念的聚类方法。本文克服基于链接的聚类方法的缺陷,采用基于概念聚类的方法,从二分图的概念和存储方法入手,设计和实现了个性化的Web数据挖掘搜索引擎,并验证了其优越性。

  8. Research on personalized service system of library based on Web mining%基于Web挖掘的图书馆个性化服务系统研究

    Institute of Scientific and Technical Information of China (English)

    唐秋鸿; 曹红兵; 唐小新; 李高虎; 高嵩

    2012-01-01

    文章提出一个兼顾虚拟与物理世界信息服务的图书馆个性化服务系统结构及读者信誉度评估模型,运用Web挖掘技术对图书馆MELINETS系统中大量的Web流通日志记录进行挖掘,构建一种能反映不同用户个体和群体信息使用行为、习惯及其变化特征的动态的读者信誉度评估机制,用以科学地划分读者群,设计了能提供满足读者个性特征和需求的个性化服务功能及其组合,为进一步开发与应用图书馆个性化服务系统提供了技术基础.%This paper presents a personalized service system model of library based on virtual and physical worlds as well as an assessment model of reader's credibility. The Web log mining technology is applied in the mining of large amounts of Web circulation log data from MELINETS system in the library in order to build a dynamic mechanism of reader's credibility assessment The mechanism can reflect readers' behavior and habits of information use and their changes varying with different individual or group users and scientifically and reasonably divide the readers into groups. A series of personalized services and their combinations which meet the needs and personality characteristics of individuals are designed for the purpose of providing a technical basis to the development and application of personalized service system of library.

  9. Web Oriented Data Mining Design of Optimization Program of Coal Enterprises Cyber-marketing%面向Web数据挖掘的煤炭企业营销优化方案设计

    Institute of Scientific and Technical Information of China (English)

    赵海涛

    2013-01-01

    针对煤炭企业网络营销,介绍了数据挖掘,Web挖掘和短文本分类的理论,分析了短文本分类的几种算法。尝试利用HTML标记权重来改善朴素贝叶斯算法的条件独立假设的不足,并利用标记中的有用信息结合短文本分类算法进行短文本分类。最后,针对改进的分类器的在精确率上不太理想的特点,对本课题下一步要研究的内容进行了总结,并提出了自己的一些看法。%For cyber-marketing of coal enterprises, this paper introduces the data mining, Web mining and the short text classification theory and several short text classification algorithms are analyzed. Try to use the HTML tag weights improved naive Bias algorithm the inadequacy of the conditional independence assumption. Short text is classified by using of markers of useful information in the short text classification algorithm. Finally, aiming at the improvement of the precision of the classifier in the less desirable characteristics, on the subject of the next step research contents are summarized, and proposed own some views.

  10. The E-Commerce Model of Health Websites: An Integration of Web Quality, Perceived Interactivity, and Web Outcomes

    Directory of Open Access Journals (Sweden)

    Chung-Hung Tsai

    2011-07-01

    Full Text Available The study integrates web quality (system quality, information quality, and service quality, perceived interactivity (human-message, human-human, and web outcomes (web usage, web satisfaction, and web loyalty to explore the e-commerce model of health websites. A survey of 1076 users of health websites was conducted to validate the proposed model. The findings show that web quality has significantly positive effect on perceived interactivity, web usage, and web satisfaction separately, which in turn influence web loyalty. This study also confirms that perceived interactivity is an important mediator between web quality and web outcomes. This study emphasizes the importance of both web quality and perceived interactivity in the progress towards success health websites. The findings may be used as theoretical base for future research and can also offer empirical foresight to executives and managers of hospitals when they initially introduce and upgrade the health websites into their organizations.

  11. A Survey on Terrorist Network Mining: Current Trends and Opportunities

    Directory of Open Access Journals (Sweden)

    Akhilesh Tiwari

    2012-09-01

    Full Text Available Along with the modernization and widespread usage of Internet, the security of the mankind has become one of the major issues today. The threat of human society from the terrorists is the challenge faced dominantly. Advancement in the technology has not only helped the common people for the growth but also these inhuman people to adversely affect the society with sophisticated techniques. In this regard, the law-enforcement agencies are aiming to prevent future attacks. To do so, the terrorist networks are being analyzed and detected. To achieve this, the law enforcement agencies are using data mining techniques as one of the effective solution. One such technique of data mining is Social network analysis which studies terrorist networks for the identification of relationships and associations that may exist between terrorist nodes. Terrorist activities can also be detected by means of analyzing Web traffic content. This paper studies social network analysis, web traffic content and explores various ways foridentifying terrorist activities.

  12. World Technology Usage Lags

    OpenAIRE

    Diego A. Comin; Bart Hobijn; Emilie Rovito

    2006-01-01

    We present evidence on the differences in the intensity with which ten major technologies are used in 185 countries across the world. We do so by calculating how many years ago these technologies were used in the U.S. at the same intensity as they are used in the countries in our sample. We denote these time lags as technology usage lags and compare them with lags in real GDP per capita. We find that (i) technology usage lags are large, often comparable to lags in real GDP per capita, (ii) us...

  13. French grammar and usage

    CERN Document Server

    Hawkins, Roger

    2015-01-01

    Long trusted as the most comprehensive, up-to-date and user-friendly grammar available, French Grammar and Usage is a complete guide to French as it is written and spoken today. It includes clear descriptions of all the main grammatical phenomena of French, and their use, illustrated by numerous examples taken from contemporary French, and distinguishes the most common forms of usage, both formal and informal.Key features include:Comprehensive content, covering all the major structures of contemporary French User-friendly organisation offering easy-to-find sections with cross-referencing and i

  14. 改进的朴素贝叶斯聚类Web文本分类挖掘技术%The Improved Naive Bayes Text Classification Data Mining Clustering Web

    Institute of Scientific and Technical Information of China (English)

    高胜利

    2012-01-01

    通过对Web数据的特点进行详细的分析,在基于传统的贝叶斯聚类算法基础上,采用网页标记形式来有效地弥补朴素贝叶斯算法的不足,并将改进的方法应用在文本分类中,是一种很好的改进思路。最后实验结果也表明,此方法能够有效地对文本进行分类。%This paper first introduced the Web mining and text classification of basic theory, specific to the Web data characteristics are analyzed in detail, mainly based on the traditional Bayesian clustering algorithm based on the proposed algorithm, the improvement of the webpage, marked form to effectively compensates for the naive Bayes algorithm is in- sufficient, will be improved method and its application in text classification, finally the experimental results show that the method can effectively classify the text.

  15. The Application of the Web Text Mining in the Druggist Interest Extraction%Web文本挖掘在药商兴趣提取中的应用

    Institute of Scientific and Technical Information of China (English)

    孙士新

    2014-01-01

    The information attainment has become the important component of the druggist's business operation and the market judgment basis. The appearance of the largely unstructured and semi-structured network has provided the technology space and the demonstration basis for the druggist's individual service. Through the critical technology of the text mining in individual service,the paper,combining the Traditional Chinese Medicinal Materials information website,has actually applied the text mining process, and applies the text mining technology to the example of the user's interest attainment about the Traditional Chinese Medicinal Materials information website.%信息获取已成为药商经营活动的重要组成部分和市场判断依据,网络大量非结构化、半结构化信息的出现为药商个性化服务提供了技术空间和实证依据。文章通过对个性化服务中文本挖掘的关键技术进行设计,并应用了中药材信息网站文本挖掘流程,把文本挖掘技术应用于中药材信息网站的用户兴趣获取实例中,实现用户兴趣的自动获取功能。

  16. Vehicle usage verification system

    NARCIS (Netherlands)

    Scanlon, William G.; McQuiston, Jonathan; Cotton, Simon L.

    2012-01-01

    EN)A computer-implemented system for verifying vehicle usage comprising a server capable of communication with a plurality of clients across a communications network. Each client is provided in a respective vehicle and with a respective global positioning system (GPS) by which the client can determi

  17. Energy Usage Analysis System

    Data.gov (United States)

    General Services Administration — The EUAS application is a web based system which serves Energy Center of Expertise, under the Office of Facilitates Management and Service Programs. EUAS is used for...

  18. Pbm: A new dataset for blog mining

    CERN Document Server

    Aziz, Mehwish

    2012-01-01

    Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital...

  19. Stratification-Based Outlier Detection over the Deep Web

    National Research Council Canada - National Science Library

    Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

    2016-01-01

    .... Introduction As a result of the rapid development of e-commerce, the deep web has been increasingly valued by data mining researchers in recent years. The deep web, which is termed to make a contr...

  20. 基于Apriori算法的Deep Web网页关系挖掘研究%Study on Deep Web pages mining based on Apriori algorithm

    Institute of Scientific and Technical Information of China (English)

    李贵; 韩子扬; 郑新录; 李征宇

    2011-01-01

    The max frequent association pages in Deep Web sites are recognized by using Apriori algorithm, and the non-max frequent association pages are pruned. Then, all the max frequent association pages are obtained by website traversing. Experimental results of some real estate Deep Web data extraction prove that the algorithm is feasible and valid.%利用Apriori算法对Deep Web网站中最大频繁关联关系网页进行识别,并对非最大频繁项网页进行剪枝,再遍历Deep Web网站网页,从而获取所有最大频繁关联关系网页.对某房地产Deep Web网站的实验结果验证了该算法的可行性和有效性.