WorldWideScience

Sample records for machine svm-based approach

  1. A support vector machine (SVM) based voltage stability classifier

    Energy Technology Data Exchange (ETDEWEB)

    Dosano, R.D.; Song, H. [Kunsan National Univ., Kunsan, Jeonbuk (Korea, Republic of); Lee, B. [Korea Univ., Seoul (Korea, Republic of)

    2007-07-01

    Power system stability has become even more complex and critical with the advent of deregulated energy markets and the growing desire to completely employ existing transmission and infrastructure. The economic pressure on electricity markets forces the operation of power systems and components to their limit of capacity and performance. System conditions can be more exposed to instability due to greater uncertainty in day to day system operations and increase in the number of potential components for system disturbances potentially resulting in voltage stability. This paper proposed a support vector machine (SVM) based power system voltage stability classifier using local measurements of voltage and active power of load. It described the procedure for fast classification of long-term voltage stability using the SVM algorithm. The application of the SVM based voltage stability classifier was presented with reference to the choice of input parameters; input data preconditioning; moving window for feature vector; determination of learning samples; and other considerations in SVM applications. The paper presented a case study with numerical examples of an 11-bus test system. The test results for the feasibility study demonstrated that the classifier could offer an excellent performance in classification with time-series measurements in terms of long-term voltage stability. 9 refs., 14 figs.

  2. An SVM Based Approach for the Analysis Of Mammography Images

    Science.gov (United States)

    Gan, X.; Kapsokalivas, L.; Skaliotis, A.; Steinhöfel, K.; Tangaro, S.

    2007-09-01

    Mammography is among the most popular imaging techniques used in the diagnosis of breast cancer. Nevertheless distinguishing between healthy and ill images is hard even for an experienced radiologist, because a single image usually includes several regions of interest (ROIs). The hardness of this classification problem along with the substantial amount of data, gathered from patients' medical history, motivates the use of a machine learning approach as part of a CAD (Computer Aided Detection) tool, aiming to assist radiologists in the characterization of mammography images. Specifically, our approach involves: i) the ROI extraction, ii) the Feature Vector extraction, iii) the Support Vector Machine (SVM) classification of ROIs and iv) the characterization of the whole image. We evaluate the performance of our approach in terms of the SVM's training and testing error and in terms of ROI specificity—sensitivity. The results show a relation between the number of features used and the SVM's performance.

  3. An SVM Based Approach for the Analysis Of Mammography Images

    International Nuclear Information System (INIS)

    Gan, X.; Kapsokalivas, L.; Skaliotis, A.; Steinhoefel, K.; Tangaro, S.

    2007-01-01

    Mammography is among the most popular imaging techniques used in the diagnosis of breast cancer. Nevertheless distinguishing between healthy and ill images is hard even for an experienced radiologist, because a single image usually includes several regions of interest (ROIs). The hardness of this classification problem along with the substantial amount of data, gathered from patients' medical history, motivates the use of a machine learning approach as part of a CAD (Computer Aided Detection) tool, aiming to assist radiologists in the characterization of mammography images. Specifically, our approach involves: i) the ROI extraction, ii) the Feature Vector extraction, iii) the Support Vector Machine (SVM) classification of ROIs and iv) the characterization of the whole image. We evaluate the performance of our approach in terms of the SVM's training and testing error and in terms of ROI specificity - sensitivity. The results show a relation between the number of features used and the SVM's performance

  4. Settlement Prediction of Road Soft Foundation Using a Support Vector Machine (SVM Based on Measured Data

    Directory of Open Access Journals (Sweden)

    Yu Huiling

    2016-01-01

    Full Text Available The suppor1t vector machine (SVM is a relatively new artificial intelligence technique which is increasingly being applied to geotechnical problems and is yielding encouraging results. SVM is a new machine learning method based on the statistical learning theory. A case study based on road foundation engineering project shows that the forecast results are in good agreement with the measured data. The SVM model is also compared with BP artificial neural network model and traditional hyperbola method. The prediction results indicate that the SVM model has a better prediction ability than BP neural network model and hyperbola method. Therefore, settlement prediction based on SVM model can reflect actual settlement process more correctly. The results indicate that it is effective and feasible to use this method and the nonlinear mapping relation between foundation settlement and its influence factor can be expressed well. It will provide a new method to predict foundation settlement.

  5. Cerebral 18F-FDG PET in macrophagic myofasciitis: An individual SVM-based approach.

    Science.gov (United States)

    Blanc-Durand, Paul; Van Der Gucht, Axel; Guedj, Eric; Abulizi, Mukedaisi; Aoun-Sebaiti, Mehdi; Lerman, Lionel; Verger, Antoine; Authier, François-Jérôme; Itti, Emmanuel

    2017-01-01

    Macrophagic myofasciitis (MMF) is an emerging condition with highly specific myopathological alterations. A peculiar spatial pattern of a cerebral glucose hypometabolism involving occipito-temporal cortex and cerebellum have been reported in patients with MMF; however, the full pattern is not systematically present in routine interpretation of scans, and with varying degrees of severity depending on the cognitive profile of patients. Aim was to generate and evaluate a support vector machine (SVM) procedure to classify patients between healthy or MMF 18F-FDG brain profiles. 18F-FDG PET brain images of 119 patients with MMF and 64 healthy subjects were retrospectively analyzed. The whole-population was divided into two groups; a training set (100 MMF, 44 healthy subjects) and a testing set (19 MMF, 20 healthy subjects). Dimensionality reduction was performed using a t-map from statistical parametric mapping (SPM) and a SVM with a linear kernel was trained on the training set. To evaluate the performance of the SVM classifier, values of sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV) and accuracy (Acc) were calculated. The SPM12 analysis on the training set exhibited the already reported hypometabolism pattern involving occipito-temporal and fronto-parietal cortices, limbic system and cerebellum. The SVM procedure, based on the t-test mask generated from the training set, correctly classified MMF patients of the testing set with following Se, Sp, PPV, NPV and Acc: 89%, 85%, 85%, 89%, and 87%. We developed an original and individual approach including a SVM to classify patients between healthy or MMF metabolic brain profiles using 18F-FDG-PET. Machine learning algorithms are promising for computer-aided diagnosis but will need further validation in prospective cohorts.

  6. A SVM bases AI design for interactive gaming

    OpenAIRE

    Jiang, Yang; Jiang, Jianmin; Palmer, Ian

    2008-01-01

    Interactive gaming requires automatic processing on large volume of random data produced by players on spot, such as shooting, football kicking, boxing etc. In this paper, we describe an artificial intelligence approach in processing such random data for interactive gaming by using a one-class support vector machine (OC-SVM). In comparison with existing techniques, our OC-SVM based interactive gaming design has the features of: (i): high speed processing, providing instant response to the pla...

  7. Power quality events recognition using a SVM-based method

    Energy Technology Data Exchange (ETDEWEB)

    Cerqueira, Augusto Santiago; Ferreira, Danton Diego; Ribeiro, Moises Vidal; Duque, Carlos Augusto [Department of Electrical Circuits, Federal University of Juiz de Fora, Campus Universitario, 36036 900, Juiz de Fora MG (Brazil)

    2008-09-15

    In this paper, a novel SVM-based method for power quality event classification is proposed. A simple approach for feature extraction is introduced, based on the subtraction of the fundamental component from the acquired voltage signal. The resulting signal is presented to a support vector machine for event classification. Results from simulation are presented and compared with two other methods, the OTFR and the LCEC. The proposed method shown an improved performance followed by a reasonable computational cost. (author)

  8. Generalized SMO algorithm for SVM-based multitask learning.

    Science.gov (United States)

    Cai, Feng; Cherkassky, Vladimir

    2012-06-01

    Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.

  9. An SVM-Based Classifier for Estimating the State of Various Rotating Components in Agro-Industrial Machinery with a Vibration Signal Acquired from a Single Point on the Machine Chassis

    Directory of Open Access Journals (Sweden)

    Ruben Ruiz-Gonzalez

    2014-11-01

    Full Text Available The goal of this article is to assess the feasibility of estimating the state of various rotating components in agro-industrial machinery by employing just one vibration signal acquired from a single point on the machine chassis. To do so, a Support Vector Machine (SVM-based system is employed. Experimental tests evaluated this system by acquiring vibration data from a single point of an agricultural harvester, while varying several of its working conditions. The whole process included two major steps. Initially, the vibration data were preprocessed through twelve feature extraction algorithms, after which the Exhaustive Search method selected the most suitable features. Secondly, the SVM-based system accuracy was evaluated by using Leave-One-Out cross-validation, with the selected features as the input data. The results of this study provide evidence that (i accurate estimation of the status of various rotating components in agro-industrial machinery is possible by processing the vibration signal acquired from a single point on the machine structure; (ii the vibration signal can be acquired with a uniaxial accelerometer, the orientation of which does not significantly affect the classification accuracy; and, (iii when using an SVM classifier, an 85% mean cross-validation accuracy can be reached, which only requires a maximum of seven features as its input, and no significant improvements are noted between the use of either nonlinear or linear kernels.

  10. SVM-based glioma grading. Optimization by feature reduction analysis

    International Nuclear Information System (INIS)

    Zoellner, Frank G.; Schad, Lothar R.; Emblem, Kyrre E.; Harvard Medical School, Boston, MA; Oslo Univ. Hospital

    2012-01-01

    We investigated the predictive power of feature reduction analysis approaches in support vector machine (SVM)-based classification of glioma grade. In 101 untreated glioma patients, three analytic approaches were evaluated to derive an optimal reduction in features; (i) Pearson's correlation coefficients (PCC), (ii) principal component analysis (PCA) and (iii) independent component analysis (ICA). Tumor grading was performed using a previously reported SVM approach including whole-tumor cerebral blood volume (CBV) histograms and patient age. Best classification accuracy was found using PCA at 85% (sensitivity = 89%, specificity = 84%) when reducing the feature vector from 101 (100-bins rCBV histogram + age) to 3 principal components. In comparison, classification accuracy by PCC was 82% (89%, 77%, 2 dimensions) and 79% by ICA (87%, 75%, 9 dimensions). For improved speed (up to 30%) and simplicity, feature reduction by all three methods provided similar classification accuracy to literature values (∝87%) while reducing the number of features by up to 98%. (orig.)

  11. SVM-based glioma grading. Optimization by feature reduction analysis

    Energy Technology Data Exchange (ETDEWEB)

    Zoellner, Frank G.; Schad, Lothar R. [University Medical Center Mannheim, Heidelberg Univ., Mannheim (Germany). Computer Assisted Clinical Medicine; Emblem, Kyrre E. [Massachusetts General Hospital, Charlestown, A.A. Martinos Center for Biomedical Imaging, Boston MA (United States). Dept. of Radiology; Harvard Medical School, Boston, MA (United States); Oslo Univ. Hospital (Norway). The Intervention Center

    2012-11-01

    We investigated the predictive power of feature reduction analysis approaches in support vector machine (SVM)-based classification of glioma grade. In 101 untreated glioma patients, three analytic approaches were evaluated to derive an optimal reduction in features; (i) Pearson's correlation coefficients (PCC), (ii) principal component analysis (PCA) and (iii) independent component analysis (ICA). Tumor grading was performed using a previously reported SVM approach including whole-tumor cerebral blood volume (CBV) histograms and patient age. Best classification accuracy was found using PCA at 85% (sensitivity = 89%, specificity = 84%) when reducing the feature vector from 101 (100-bins rCBV histogram + age) to 3 principal components. In comparison, classification accuracy by PCC was 82% (89%, 77%, 2 dimensions) and 79% by ICA (87%, 75%, 9 dimensions). For improved speed (up to 30%) and simplicity, feature reduction by all three methods provided similar classification accuracy to literature values ({proportional_to}87%) while reducing the number of features by up to 98%. (orig.)

  12. SVM Based Descriptor Selection and Classification of Neurodegenerative Disease Drugs for Pharmacological Modeling.

    Science.gov (United States)

    Shahid, Mohammad; Shahzad Cheema, Muhammad; Klenner, Alexander; Younesi, Erfan; Hofmann-Apitius, Martin

    2013-03-01

    Systems pharmacological modeling of drug mode of action for the next generation of multitarget drugs may open new routes for drug design and discovery. Computational methods are widely used in this context amongst which support vector machines (SVM) have proven successful in addressing the challenge of classifying drugs with similar features. We have applied a variety of such SVM-based approaches, namely SVM-based recursive feature elimination (SVM-RFE). We use the approach to predict the pharmacological properties of drugs widely used against complex neurodegenerative disorders (NDD) and to build an in-silico computational model for the binary classification of NDD drugs from other drugs. Application of an SVM-RFE model to a set of drugs successfully classified NDD drugs from non-NDD drugs and resulted in overall accuracy of ∼80 % with 10 fold cross validation using 40 top ranked molecular descriptors selected out of total 314 descriptors. Moreover, SVM-RFE method outperformed linear discriminant analysis (LDA) based feature selection and classification. The model reduced the multidimensional descriptors space of drugs dramatically and predicted NDD drugs with high accuracy, while avoiding over fitting. Based on these results, NDD-specific focused libraries of drug-like compounds can be designed and existing NDD-specific drugs can be characterized by a well-characterized set of molecular descriptors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Ultrasonic fluid quantity measurement in dynamic vehicular applications a support vector machine approach

    CERN Document Server

    Terzic, Jenny; Nagarajah, Romesh; Alamgir, Muhammad

    2013-01-01

    Accurate fluid level measurement in dynamic environments can be assessed using a Support Vector Machine (SVM) approach. SVM is a supervised learning model that analyzes and recognizes patterns. It is a signal classification technique which has far greater accuracy than conventional signal averaging methods. Ultrasonic Fluid Quantity Measurement in Dynamic Vehicular Applications: A Support Vector Machine Approach describes the research and development of a fluid level measurement system for dynamic environments. The measurement system is based on a single ultrasonic sensor. A Support Vector Machines (SVM) based signal characterization and processing system has been developed to compensate for the effects of slosh and temperature variation in fluid level measurement systems used in dynamic environments including automotive applications. It has been demonstrated that a simple ν-SVM model with Radial Basis Function (RBF) Kernel with the inclusion of a Moving Median filter could be used to achieve the high levels...

  14. Effective Sequential Classifier Training for SVM-Based Multitemporal Remote Sensing Image Classification

    Science.gov (United States)

    Guo, Yiqing; Jia, Xiuping; Paull, David

    2018-06-01

    The explosive availability of remote sensing images has challenged supervised classification algorithms such as Support Vector Machines (SVM), as training samples tend to be highly limited due to the expensive and laborious task of ground truthing. The temporal correlation and spectral similarity between multitemporal images have opened up an opportunity to alleviate this problem. In this study, a SVM-based Sequential Classifier Training (SCT-SVM) approach is proposed for multitemporal remote sensing image classification. The approach leverages the classifiers of previous images to reduce the required number of training samples for the classifier training of an incoming image. For each incoming image, a rough classifier is firstly predicted based on the temporal trend of a set of previous classifiers. The predicted classifier is then fine-tuned into a more accurate position with current training samples. This approach can be applied progressively to sequential image data, with only a small number of training samples being required from each image. Experiments were conducted with Sentinel-2A multitemporal data over an agricultural area in Australia. Results showed that the proposed SCT-SVM achieved better classification accuracies compared with two state-of-the-art model transfer algorithms. When training data are insufficient, the overall classification accuracy of the incoming image was improved from 76.18% to 94.02% with the proposed SCT-SVM, compared with those obtained without the assistance from previous images. These results demonstrate that the leverage of a priori information from previous images can provide advantageous assistance for later images in multitemporal image classification.

  15. Hybrid PSO–SVM-based method for forecasting of the remaining useful life for aircraft engines and evaluation of its reliability

    International Nuclear Information System (INIS)

    García Nieto, P.J.; García-Gonzalo, E.; Sánchez Lasheras, F.; Cos Juez, F.J. de

    2015-01-01

    The present paper describes a hybrid PSO–SVM-based model for the prediction of the remaining useful life of aircraft engines. The proposed hybrid model combines support vector machines (SVMs), which have been successfully adopted for regression problems, with the particle swarm optimization (PSO) technique. This optimization technique involves kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not been yet widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid PSO–SVM-based model from the remaining measured parameters (input variables) for aircraft engines with success. A coefficient of determination equal to 0.9034 was obtained when this hybrid PSO–RBF–SVM-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. One of the main advantages of this predictive model is that it does not require information about the previous operation states of the engine. Finally, the main conclusions of this study are exposed. - Highlights: • A hybrid PSO–SVM-based model is built as a predictive model of the RUL values for aircraft engines. • The remaining physical–chemical variables in this process are studied in depth. • The obtained regression accuracy of our method is about 95%. • The results show that PSO–SVM-based model can assist in the diagnosis of the RUL values with accuracy

  16. Diagnosis of Elevator Faults with LS-SVM Based on Optimization by K-CV

    Directory of Open Access Journals (Sweden)

    Zhou Wan

    2015-01-01

    Full Text Available Several common elevator malfunctions were diagnosed with a least square support vector machine (LS-SVM. After acquiring vibration signals of various elevator functions, their energy characteristics and time domain indicators were extracted by theoretically analyzing the optimal wavelet packet, in order to construct a feature vector of malfunctions for identifying causes of the malfunctions as input of LS-SVM. Meanwhile, parameters about LS-SVM were optimized by K-fold cross validation (K-CV. After diagnosing deviated elevator guide rail, deviated shape of guide shoe, abnormal running of tractor, erroneous rope groove of traction sheave, deviated guide wheel, and tension of wire rope, the results suggested that the LS-SVM based on K-CV optimization was one of effective methods for diagnosing elevator malfunctions.

  17. Modeling the milling tool wear by using an evolutionary SVM-based model from milling runs experimental data

    Science.gov (United States)

    Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade

    2015-12-01

    The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.

  18. Machine learning approaches in medical image analysis

    DEFF Research Database (Denmark)

    de Bruijne, Marleen

    2016-01-01

    Machine learning approaches are increasingly successful in image-based diagnosis, disease prognosis, and risk assessment. This paper highlights new research directions and discusses three main challenges related to machine learning in medical imaging: coping with variation in imaging protocols......, learning from weak labels, and interpretation and evaluation of results....

  19. Machine learning an artificial intelligence approach

    CERN Document Server

    Banerjee, R; Bradshaw, Gary; Carbonell, Jaime Guillermo; Mitchell, Tom Michael; Michalski, Ryszard Spencer

    1983-01-01

    Machine Learning: An Artificial Intelligence Approach contains tutorial overviews and research papers representative of trends in the area of machine learning as viewed from an artificial intelligence perspective. The book is organized into six parts. Part I provides an overview of machine learning and explains why machines should learn. Part II covers important issues affecting the design of learning programs-particularly programs that learn from examples. It also describes inductive learning systems. Part III deals with learning by analogy, by experimentation, and from experience. Parts IV a

  20. Support vector machine based fault detection approach for RFT-30 cyclotron

    Energy Technology Data Exchange (ETDEWEB)

    Kong, Young Bae, E-mail: ybkong@kaeri.re.kr; Lee, Eun Je; Hur, Min Goo; Park, Jeong Hoon; Park, Yong Dae; Yang, Seung Dae

    2016-10-21

    An RFT-30 is a 30 MeV cyclotron used for radioisotope applications and radiopharmaceutical researches. The RFT-30 cyclotron is highly complex and includes many signals for control and monitoring of the system. It is quite difficult to detect and monitor the system failure in real time. Moreover, continuous monitoring of the system is hard and time-consuming work for human operators. In this paper, we propose a support vector machine (SVM) based fault detection approach for the RFT-30 cyclotron. The proposed approach performs SVM learning with training samples to construct the classification model. To compensate the system complexity due to the large-scale accelerator, we utilize the principal component analysis (PCA) for transformation of the original data. After training procedure, the proposed approach detects the system faults in real time. We analyzed the performance of the proposed approach utilizing the experimental data of the RFT-30 cyclotron. The performance results show that the proposed SVM approach can provide an efficient way to control the cyclotron system.

  1. SVM-Based Control System for a Robot Manipulator

    Directory of Open Access Journals (Sweden)

    Foudil Abdessemed

    2012-12-01

    Full Text Available Real systems are usually non-linear, ill-defined, have variable parameters and are subject to external disturbances. Modelling these systems is often an approximation of the physical phenomena involved. However, it is from this approximate system of representation that we propose - in this paper - to build a robust control, in the sense that it must ensure low sensitivity towards parameters, uncertainties, variations and external disturbances. The computed torque method is a well-established robot control technique which takes account of the dynamic coupling between the robot links. However, its main disadvantage lies on the assumption of an exactly known dynamic model which is not realizable in practice. To overcome this issue, we propose the estimation of the dynamics model of the nonlinear system with a machine learning regression method. The output of this regressor is used in conjunction with a PD controller to achieve the tracking trajectory task of a robot manipulator. In cases where some of the parameters of the plant undergo a change in their values, poor performance may result. To cope with this drawback, a fuzzy precompensator is inserted to reinforce the SVM computed torque-based controller and avoid any deterioration. The theory is developed and the simulation results are carried out on a two-degree of freedom robot manipulator to demonstrate the validity of the proposed approach.

  2. SVM-Based Spectral Analysis for Heart Rate from Multi-Channel WPPG Sensor Signals.

    Science.gov (United States)

    Xiong, Jiping; Cai, Lisang; Wang, Fei; He, Xiaowei

    2017-03-03

    Although wrist-type photoplethysmographic (hereafter referred to as WPPG) sensor signals can measure heart rate quite conveniently, the subjects' hand movements can cause strong motion artifacts, and then the motion artifacts will heavily contaminate WPPG signals. Hence, it is challenging for us to accurately estimate heart rate from WPPG signals during intense physical activities. The WWPG method has attracted more attention thanks to the popularity of wrist-worn wearable devices. In this paper, a mixed approach called Mix-SVM is proposed, it can use multi-channel WPPG sensor signals and simultaneous acceleration signals to measurement heart rate. Firstly, we combine the principle component analysis and adaptive filter to remove a part of the motion artifacts. Due to the strong relativity between motion artifacts and acceleration signals, the further denoising problem is regarded as a sparse signals reconstruction problem. Then, we use a spectrum subtraction method to eliminate motion artifacts effectively. Finally, the spectral peak corresponding to heart rate is sought by an SVM-based spectral analysis method. Through the public PPG database in the 2015 IEEE Signal Processing Cup, we acquire the experimental results, i.e., the average absolute error was 1.01 beat per minute, and the Pearson correlation was 0.9972. These results also confirm that the proposed Mix-SVM approach has potential for multi-channel WPPG-based heart rate estimation in the presence of intense physical exercise.

  3. SVM-Based Spectral Analysis for Heart Rate from Multi-Channel WPPG Sensor Signals

    Directory of Open Access Journals (Sweden)

    Jiping Xiong

    2017-03-01

    Full Text Available Although wrist-type photoplethysmographic (hereafter referred to as WPPG sensor signals can measure heart rate quite conveniently, the subjects’ hand movements can cause strong motion artifacts, and then the motion artifacts will heavily contaminate WPPG signals. Hence, it is challenging for us to accurately estimate heart rate from WPPG signals during intense physical activities. The WWPG method has attracted more attention thanks to the popularity of wrist-worn wearable devices. In this paper, a mixed approach called Mix-SVM is proposed, it can use multi-channel WPPG sensor signals and simultaneous acceleration signals to measurement heart rate. Firstly, we combine the principle component analysis and adaptive filter to remove a part of the motion artifacts. Due to the strong relativity between motion artifacts and acceleration signals, the further denoising problem is regarded as a sparse signals reconstruction problem. Then, we use a spectrum subtraction method to eliminate motion artifacts effectively. Finally, the spectral peak corresponding to heart rate is sought by an SVM-based spectral analysis method. Through the public PPG database in the 2015 IEEE Signal Processing Cup, we acquire the experimental results, i.e., the average absolute error was 1.01 beat per minute, and the Pearson correlation was 0.9972. These results also confirm that the proposed Mix-SVM approach has potential for multi-channel WPPG-based heart rate estimation in the presence of intense physical exercise.

  4. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons.

    Science.gov (United States)

    Long, Yi; Du, Zhi-Jiang; Wang, Wei-Dong; Zhao, Guang-Yu; Xu, Guo-Qiang; He, Long; Mao, Xi-Wang; Dong, Wei

    2016-09-02

    Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM) optimized by particle swarm optimization (PSO) to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS) attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz), a three-layer wavelet packet analysis (WPA) is used for feature extraction, after which, the kernel principal component analysis (kPCA) is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA) is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.

  5. PSO-SVM-Based Online Locomotion Mode Identification for Rehabilitation Robotic Exoskeletons

    Directory of Open Access Journals (Sweden)

    Yi Long

    2016-09-01

    Full Text Available Locomotion mode identification is essential for the control of a robotic rehabilitation exoskeletons. This paper proposes an online support vector machine (SVM optimized by particle swarm optimization (PSO to identify different locomotion modes to realize a smooth and automatic locomotion transition. A PSO algorithm is used to obtain the optimal parameters of SVM for a better overall performance. Signals measured by the foot pressure sensors integrated in the insoles of wearable shoes and the MEMS-based attitude and heading reference systems (AHRS attached on the shoes and shanks of leg segments are fused together as the input information of SVM. Based on the chosen window whose size is 200 ms (with sampling frequency of 40 Hz, a three-layer wavelet packet analysis (WPA is used for feature extraction, after which, the kernel principal component analysis (kPCA is utilized to reduce the dimension of the feature set to reduce computation cost of the SVM. Since the signals are from two types of different sensors, the normalization is conducted to scale the input into the interval of [0, 1]. Five-fold cross validation is adapted to train the classifier, which prevents the classifier over-fitting. Based on the SVM model obtained offline in MATLAB, an online SVM algorithm is constructed for locomotion mode identification. Experiments are performed for different locomotion modes and experimental results show the effectiveness of the proposed algorithm with an accuracy of 96.00% ± 2.45%. To improve its accuracy, majority vote algorithm (MVA is used for post-processing, with which the identification accuracy is better than 98.35% ± 1.65%. The proposed algorithm can be extended and employed in the field of robotic rehabilitation and assistance.

  6. LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran.

    Science.gov (United States)

    Ghaemi, Z; Alimohammadi, A; Farnaghi, M

    2018-04-20

    Due to critical impacts of air pollution, prediction and monitoring of air quality in urban areas are important tasks. However, because of the dynamic nature and high spatio-temporal variability, prediction of the air pollutant concentrations is a complex spatio-temporal problem. Distribution of pollutant concentration is influenced by various factors such as the historical pollution data and weather conditions. Conventional methods such as the support vector machine (SVM) or artificial neural networks (ANN) show some deficiencies when huge amount of streaming data have to be analyzed for urban air pollution prediction. In order to overcome the limitations of the conventional methods and improve the performance of urban air pollution prediction in Tehran, a spatio-temporal system is designed using a LaSVM-based online algorithm. Pollutant concentration and meteorological data along with geographical parameters are continually fed to the developed online forecasting system. Performance of the system is evaluated by comparing the prediction results of the Air Quality Index (AQI) with those of a traditional SVM algorithm. Results show an outstanding increase of speed by the online algorithm while preserving the accuracy of the SVM classifier. Comparison of the hourly predictions for next coming 24 h, with those of the measured pollution data in Tehran pollution monitoring stations shows an overall accuracy of 0.71, root mean square error of 0.54 and coefficient of determination of 0.81. These results are indicators of the practical usefulness of the online algorithm for real-time spatial and temporal prediction of the urban air quality.

  7. A SVM-based quantitative fMRI method for resting-state functional network detection.

    Science.gov (United States)

    Song, Xiaomu; Chen, Nan-kuei

    2014-09-01

    Resting-state functional magnetic resonance imaging (fMRI) aims to measure baseline neuronal connectivity independent of specific functional tasks and to capture changes in the connectivity due to neurological diseases. Most existing network detection methods rely on a fixed threshold to identify functionally connected voxels under the resting state. Due to fMRI non-stationarity, the threshold cannot adapt to variation of data characteristics across sessions and subjects, and generates unreliable mapping results. In this study, a new method is presented for resting-state fMRI data analysis. Specifically, the resting-state network mapping is formulated as an outlier detection process that is implemented using one-class support vector machine (SVM). The results are refined by using a spatial-feature domain prototype selection method and two-class SVM reclassification. The final decision on each voxel is made by comparing its probabilities of functionally connected and unconnected instead of a threshold. Multiple features for resting-state analysis were extracted and examined using an SVM-based feature selection method, and the most representative features were identified. The proposed method was evaluated using synthetic and experimental fMRI data. A comparison study was also performed with independent component analysis (ICA) and correlation analysis. The experimental results show that the proposed method can provide comparable or better network detection performance than ICA and correlation analysis. The method is potentially applicable to various resting-state quantitative fMRI studies. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. SVM-Based System for Prediction of Epileptic Seizures from iEEG Signal

    Science.gov (United States)

    Cherkassky, Vladimir; Lee, Jieun; Veber, Brandon; Patterson, Edward E.; Brinkmann, Benjamin H.; Worrell, Gregory A.

    2017-01-01

    Objective This paper describes a data-analytic modeling approach for prediction of epileptic seizures from intracranial electroencephalogram (iEEG) recording of brain activity. Even though it is widely accepted that statistical characteristics of iEEG signal change prior to seizures, robust seizure prediction remains a challenging problem due to subject-specific nature of data-analytic modeling. Methods Our work emphasizes understanding of clinical considerations important for iEEG-based seizure prediction, and proper translation of these clinical considerations into data-analytic modeling assumptions. Several design choices during pre-processing and post-processing are considered and investigated for their effect on seizure prediction accuracy. Results Our empirical results show that the proposed SVM-based seizure prediction system can achieve robust prediction of preictal and interictal iEEG segments from dogs with epilepsy. The sensitivity is about 90–100%, and the false-positive rate is about 0–0.3 times per day. The results also suggest good prediction is subject-specific (dog or human), in agreement with earlier studies. Conclusion Good prediction performance is possible only if the training data contain sufficiently many seizure episodes, i.e., at least 5–7 seizures. Significance The proposed system uses subject-specific modeling and unbalanced training data. This system also utilizes three different time scales during training and testing stages. PMID:27362758

  9. A MOOC on Approaches to Machine Translation

    Science.gov (United States)

    Costa-jussà, Mart R.; Formiga, Lluís; Torrillas, Oriol; Petit, Jordi; Fonollosa, José A. R.

    2015-01-01

    This paper describes the design, development, and analysis of a MOOC entitled "Approaches to Machine Translation: Rule-based, statistical and hybrid", and provides lessons learned and conclusions to be taken into account in the future. The course was developed within the Canvas platform, used by recognized European universities. It…

  10. Machine Learning Approaches in Cardiovascular Imaging.

    Science.gov (United States)

    Henglin, Mir; Stein, Gillian; Hushcha, Pavel V; Snoek, Jasper; Wiltschko, Alexander B; Cheng, Susan

    2017-10-01

    Cardiovascular imaging technologies continue to increase in their capacity to capture and store large quantities of data. Modern computational methods, developed in the field of machine learning, offer new approaches to leveraging the growing volume of imaging data available for analyses. Machine learning methods can now address data-related problems ranging from simple analytic queries of existing measurement data to the more complex challenges involved in analyzing raw images. To date, machine learning has been used in 2 broad and highly interconnected areas: automation of tasks that might otherwise be performed by a human and generation of clinically important new knowledge. Most cardiovascular imaging studies have focused on task-oriented problems, but more studies involving algorithms aimed at generating new clinical insights are emerging. Continued expansion in the size and dimensionality of cardiovascular imaging databases is driving strong interest in applying powerful deep learning methods, in particular, to analyze these data. Overall, the most effective approaches will require an investment in the resources needed to appropriately prepare such large data sets for analyses. Notwithstanding current technical and logistical challenges, machine learning and especially deep learning methods have much to offer and will substantially impact the future practice and science of cardiovascular imaging. © 2017 American Heart Association, Inc.

  11. In-Vivo Imaging of Cell Migration Using Contrast Enhanced MRI and SVM Based Post-Processing.

    Science.gov (United States)

    Weis, Christian; Hess, Andreas; Budinsky, Lubos; Fabry, Ben

    2015-01-01

    The migration of cells within a living organism can be observed with magnetic resonance imaging (MRI) in combination with iron oxide nanoparticles as an intracellular contrast agent. This method, however, suffers from low sensitivity and specificty. Here, we developed a quantitative non-invasive in-vivo cell localization method using contrast enhanced multiparametric MRI and support vector machines (SVM) based post-processing. Imaging phantoms consisting of agarose with compartments containing different concentrations of cancer cells labeled with iron oxide nanoparticles were used to train and evaluate the SVM for cell localization. From the magnitude and phase data acquired with a series of T2*-weighted gradient-echo scans at different echo-times, we extracted features that are characteristic for the presence of superparamagnetic nanoparticles, in particular hyper- and hypointensities, relaxation rates, short-range phase perturbations, and perturbation dynamics. High detection quality was achieved by SVM analysis of the multiparametric feature-space. The in-vivo applicability was validated in animal studies. The SVM detected the presence of iron oxide nanoparticles in the imaging phantoms with high specificity and sensitivity with a detection limit of 30 labeled cells per mm3, corresponding to 19 μM of iron oxide. As proof-of-concept, we applied the method to follow the migration of labeled cancer cells injected in rats. The combination of iron oxide labeled cells, multiparametric MRI and a SVM based post processing provides high spatial resolution, specificity, and sensitivity, and is therefore suitable for non-invasive in-vivo cell detection and cell migration studies over prolonged time periods.

  12. Distinguishing Asthma Phenotypes Using Machine Learning Approaches.

    Science.gov (United States)

    Howard, Rebecca; Rattray, Magnus; Prosperi, Mattia; Custovic, Adnan

    2015-07-01

    Asthma is not a single disease, but an umbrella term for a number of distinct diseases, each of which are caused by a distinct underlying pathophysiological mechanism. These discrete disease entities are often labelled as 'asthma endotypes'. The discovery of different asthma subtypes has moved from subjective approaches in which putative phenotypes are assigned by experts to data-driven ones which incorporate machine learning. This review focuses on the methodological developments of one such machine learning technique-latent class analysis-and how it has contributed to distinguishing asthma and wheezing subtypes in childhood. It also gives a clinical perspective, presenting the findings of studies from the past 5 years that used this approach. The identification of true asthma endotypes may be a crucial step towards understanding their distinct pathophysiological mechanisms, which could ultimately lead to more precise prevention strategies, identification of novel therapeutic targets and the development of effective personalized therapies.

  13. SVM-based Partial Discharge Pattern Classification for GIS

    Science.gov (United States)

    Ling, Yin; Bai, Demeng; Wang, Menglin; Gong, Xiaojin; Gu, Chao

    2018-01-01

    Partial discharges (PD) occur when there are localized dielectric breakdowns in small regions of gas insulated substations (GIS). It is of high importance to recognize the PD patterns, through which we can diagnose the defects caused by different sources so that predictive maintenance can be conducted to prevent from unplanned power outage. In this paper, we propose an approach to perform partial discharge pattern classification. It first recovers the PRPD matrices from the PRPD2D images; then statistical features are extracted from the recovered PRPD matrix and fed into SVM for classification. Experiments conducted on a dataset containing thousands of images demonstrates the high effectiveness of the method.

  14. FaaPred: a SVM-based prediction method for fungal adhesins and adhesin-like proteins.

    Directory of Open Access Journals (Sweden)

    Jayashree Ramana

    Full Text Available Adhesion constitutes one of the initial stages of infection in microbial diseases and is mediated by adhesins. Hence, identification and comprehensive knowledge of adhesins and adhesin-like proteins is essential to understand adhesin mediated pathogenesis and how to exploit its therapeutic potential. However, the knowledge about fungal adhesins is rudimentary compared to that of bacterial adhesins. In addition to host cell attachment and mating, the fungal adhesins play a significant role in homotypic and xenotypic aggregation, foraging and biofilm formation. Experimental identification of fungal adhesins is labor- as well as time-intensive. In this work, we present a Support Vector Machine (SVM based method for the prediction of fungal adhesins and adhesin-like proteins. The SVM models were trained with different compositional features, namely, amino acid, dipeptide, multiplet fractions, charge and hydrophobic compositions, as well as PSI-BLAST derived PSSM matrices. The best classifiers are based on compositional properties as well as PSSM and yield an overall accuracy of 86%. The prediction method based on best classifiers is freely accessible as a world wide web based server at http://bioinfo.icgeb.res.in/faap. This work will aid rapid and rational identification of fungal adhesins, expedite the pace of experimental characterization of novel fungal adhesins and enhance our knowledge about role of adhesins in fungal infections.

  15. Statistical and machine learning approaches for network analysis

    CERN Document Server

    Dehmer, Matthias

    2012-01-01

    Explore the multidisciplinary nature of complex networks through machine learning techniques Statistical and Machine Learning Approaches for Network Analysis provides an accessible framework for structurally analyzing graphs by bringing together known and novel approaches on graph classes and graph measures for classification. By providing different approaches based on experimental data, the book uniquely sets itself apart from the current literature by exploring the application of machine learning techniques to various types of complex networks. Comprised of chapters written by internation

  16. Robust LS-SVM-based adaptive constrained control for a class of uncertain nonlinear systems with time-varying predefined performance

    Science.gov (United States)

    Luo, Jianjun; Wei, Caisheng; Dai, Honghua; Yuan, Jianping

    2018-03-01

    This paper focuses on robust adaptive control for a class of uncertain nonlinear systems subject to input saturation and external disturbance with guaranteed predefined tracking performance. To reduce the limitations of classical predefined performance control method in the presence of unknown initial tracking errors, a novel predefined performance function with time-varying design parameters is first proposed. Then, aiming at reducing the complexity of nonlinear approximations, only two least-square-support-vector-machine-based (LS-SVM-based) approximators with two design parameters are required through norm form transformation of the original system. Further, a novel LS-SVM-based adaptive constrained control scheme is developed under the time-vary predefined performance using backstepping technique. Wherein, to avoid the tedious analysis and repeated differentiations of virtual control laws in the backstepping technique, a simple and robust finite-time-convergent differentiator is devised to only extract its first-order derivative at each step in the presence of external disturbance. In this sense, the inherent demerit of backstepping technique-;explosion of terms; brought by the recursive virtual controller design is conquered. Moreover, an auxiliary system is designed to compensate the control saturation. Finally, three groups of numerical simulations are employed to validate the effectiveness of the newly developed differentiator and the proposed adaptive constrained control scheme.

  17. Modularity Design Approach for Preventive Machine Maintenance

    Science.gov (United States)

    Ernawati, D.; Pudji, E.; Ngatilah, Y.; Handoyo, R.

    2018-01-01

    In a company, machine maintenance system will be very influential in production process activity. The company should have a scheduled engine maintenance system that does not require high costs when repairing and replacing machine parts. Modularity Design method is able to provide solutions to the engine maintenance scheduling system and can prevent fatal damage to the engine components. It can minimize the cost of repair and replacement of these machine components.The paper provides a solution to machine maintenance problems. The paper is also completed with case study of milling machines. That case studies can give us a real description about impact implementation of modularity design to prevent fatal damage to components and minimize the cost of repair and replacement of components of the machine.

  18. The cognitive approach to conscious machines

    CERN Document Server

    Haikonen, Pentti O

    2003-01-01

    Could a machine have an immaterial mind? The author argues that true conscious machines can be built, but rejects artificial intelligence and classical neural networks in favour of the emulation of the cognitive processes of the brain-the flow of inner speech, inner imagery and emotions. This results in a non-numeric meaning-processing machine with distributed information representation and system reactions. It is argued that this machine would be conscious; it would be aware of its own existence and its mental content and perceive this as immaterial. Novel views on consciousness and the mind-

  19. Intelligent Machine Learning Approaches for Aerospace Applications

    Science.gov (United States)

    Sathyan, Anoop

    Machine Learning is a type of artificial intelligence that provides machines or networks the ability to learn from data without the need to explicitly program them. There are different kinds of machine learning techniques. This thesis discusses the applications of two of these approaches: Genetic Fuzzy Logic and Convolutional Neural Networks (CNN). Fuzzy Logic System (FLS) is a powerful tool that can be used for a wide variety of applications. FLS is a universal approximator that reduces the need for complex mathematics and replaces it with expert knowledge of the system to produce an input-output mapping using If-Then rules. The expert knowledge of a system can help in obtaining the parameters for small-scale FLSs, but for larger networks we will need to use sophisticated approaches that can automatically train the network to meet the design requirements. This is where Genetic Algorithms (GA) and EVE come into the picture. Both GA and EVE can tune the FLS parameters to minimize a cost function that is designed to meet the requirements of the specific problem. EVE is an artificial intelligence developed by Psibernetix that is trained to tune large scale FLSs. The parameters of an FLS can include the membership functions and rulebase of the inherent Fuzzy Inference Systems (FISs). The main issue with using the GFS is that the number of parameters in a FIS increase exponentially with the number of inputs thus making it increasingly harder to tune them. To reduce this issue, the FLSs discussed in this thesis consist of 2-input-1-output FISs in cascade (Chapter 4) or as a layer of parallel FISs (Chapter 7). We have obtained extremely good results using GFS for different applications at a reduced computational cost compared to other algorithms that are commonly used to solve the corresponding problems. In this thesis, GFSs have been designed for controlling an inverted double pendulum, a task allocation problem of clustering targets amongst a set of UAVs, a fire

  20. A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

    Science.gov (United States)

    Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat

    2016-12-22

    The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

  1. Introducing instrumental variables in the LS-SVM based identification framework

    NARCIS (Netherlands)

    Laurain, V.; Zheng, W-X.; Toth, R.

    2011-01-01

    Least-Squares Support Vector Machines (LS-SVM) represent a promising approach to identify nonlinear systems via nonparametric estimation of the nonlinearities in a computationally and stochastically attractive way. All the methods dedicated to the solution of this problem rely on the minimization of

  2. 基于信息熵的SVM入侵检测技术%Exploring SVM-based intrusion detection through information entropy theory

    Institute of Scientific and Technical Information of China (English)

    朱文杰; 王强; 翟献军

    2013-01-01

    在传统基于SVM的入侵检测中,核函数构造和特征选择采用先验知识,普遍存在准确度不高、效率低下的问题.通过信息熵理论与SVM算法相结合的方法改进为基于信息熵的SVM入侵检测算法,可以提高入侵检测的准确性,提升入侵检测的效率.基于信息熵的SVM入侵检测算法包括两个方面:一方面,根据样本包含的用户信息熵和方差,将样本特征统一,以特征是否属于置信区间来度量.将得到的样本特征置信向量作为SVM核函数的构造参数,既可保证训练样本集与最优分类面之间的对应关系,又可得到入侵检测需要的最大分类间隔;另一方面,将样本包含的用户信息量作为度量大幅度约简样本特征子集,不但降低了样本计算规模,而且提高了分类器的训练速度.实验表明,该算法在入侵检测系统中的应用优于传统的SVM算法.%In traditional SVM based intrusion detection approaches,both core function construction and feature selection use prior knowdege.Due to this,they are not only inefficient but also inaccurate.It is observed that integrating information entropy theory into SVM-based intrusion detection can enhance both the precision and the speed.Concludely speaking,SVM-based entropy intrusion detection algorithms are made up of two aspects:on one hand,setting sample confidence vector as core function's constructor of SVM algorithm can guarantee the mapping relationship between training sample and optimization classification plane.Also,the intrusion detection's maximum interval can be acquired.On the other hand,simplifying feature subset with samples's entropy as metric standard can not only shrink the computing scale but also improve the speed.Experiments prove that the SVM based entropy intrusion detection algoritm outperfomrs other tradional algorithms.

  3. Machine learning: novel bioinformatics approaches for combating antimicrobial resistance.

    Science.gov (United States)

    Macesic, Nenad; Polubriaginof, Fernanda; Tatonetti, Nicholas P

    2017-12-01

    Antimicrobial resistance (AMR) is a threat to global health and new approaches to combating AMR are needed. Use of machine learning in addressing AMR is in its infancy but has made promising steps. We reviewed the current literature on the use of machine learning for studying bacterial AMR. The advent of large-scale data sets provided by next-generation sequencing and electronic health records make applying machine learning to the study and treatment of AMR possible. To date, it has been used for antimicrobial susceptibility genotype/phenotype prediction, development of AMR clinical decision rules, novel antimicrobial agent discovery and antimicrobial therapy optimization. Application of machine learning to studying AMR is feasible but remains limited. Implementation of machine learning in clinical settings faces barriers to uptake with concerns regarding model interpretability and data quality.Future applications of machine learning to AMR are likely to be laboratory-based, such as antimicrobial susceptibility phenotype prediction.

  4. Machine Learning Approaches for Clinical Psychology and Psychiatry.

    Science.gov (United States)

    Dwyer, Dominic B; Falkai, Peter; Koutsouleris, Nikolaos

    2018-05-07

    Machine learning approaches for clinical psychology and psychiatry explicitly focus on learning statistical functions from multidimensional data sets to make generalizable predictions about individuals. The goal of this review is to provide an accessible understanding of why this approach is important for future practice given its potential to augment decisions associated with the diagnosis, prognosis, and treatment of people suffering from mental illness using clinical and biological data. To this end, the limitations of current statistical paradigms in mental health research are critiqued, and an introduction is provided to critical machine learning methods used in clinical studies. A selective literature review is then presented aiming to reinforce the usefulness of machine learning methods and provide evidence of their potential. In the context of promising initial results, the current limitations of machine learning approaches are addressed, and considerations for future clinical translation are outlined.

  5. A Statistical Parameter Analysis and SVM Based Fault Diagnosis Strategy for Dynamically Tuned Gyroscopes

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Gyro's fault diagnosis plays a critical role in inertia navigation systems for higher reliability and precision. A new fault diagnosis strategy based on the statistical parameter analysis (SPA) and support vector machine(SVM) classification model was proposed for dynamically tuned gyroscopes (DTG). The SPA, a kind of time domain analysis approach, was introduced to compute a set of statistical parameters of vibration signal as the state features of DTG, with which the SVM model, a novel learning machine based on statistical learning theory (SLT), was applied and constructed to train and identify the working state of DTG. The experimental results verify that the proposed diagnostic strategy can simply and effectively extract the state features of DTG, and it outperforms the radial-basis function (RBF) neural network based diagnostic method and can more reliably and accurately diagnose the working state of DTG.

  6. Machine learning and computer vision approaches for phenotypic profiling.

    Science.gov (United States)

    Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J

    2017-01-02

    With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.

  7. Remote leak localization approach for fusion machines

    International Nuclear Information System (INIS)

    Durocher, Au.; Bruno, V.; Chantant, M.; Gargiulo, L.; Gherman, T.; Hatchressian, J.-C.; Houry, M.; Le, R.; Mouyon, D.

    2013-01-01

    Highlights: ► Description of leaks issue. ► Selection of leak localization concepts. ► Qualification of leak localization concepts. -- Abstract: Fusion machine operation requires high-vacuum conditions and does not tolerate water or gas leak in the vacuum vessels, even if they are micrometric. Tore Supra, as a fully actively cooled tokamak, has got a large leak management experience; 34 water leaks occurred since the beginning of its operation in 1988. To handle this issue, after preliminary machine protection phases, the current process for leak localization is based on water or helium pressurization network by network. It generally allows the identification of a set of components where the leakage element is located. However, the unique background of CEA-IRFM laboratory points needs of accuracy and promptness out in the leak localization process. Moreover, in-vessel interventions have to be performed trying to minimize time and risks for the persons. They are linked to access conditions, radioactivity, tracer gas high pressure and vessel conditioning. Remote operation will be one of the ways to improve these points on future fusion machines. In this case, leak sensors would have to be light weight devices in order to be integrated on a carrier or to be located outside with a sniffing process set up. A leak localization program is on-going at CEA-IRFM Laboratory with the first goal of identifying and characterizing relevant concepts to localize helium or water leaks on ITER. In the same time, CEA has developed robotic carrier for effective in-vessel intervention in a hostile environment. Three major tests campaigns with the goal to identify leak sensors have been achieved on several CEA test-beds since 2010. Very promising results have been obtained: relevant scenario of leak localization performed, concepts tested in a high volume test-bed called TITAN, and, in several conditions of pressure and temperature (ultrahigh vacuum to atmospheric pressure and 20

  8. Remote leak localization approach for fusion machines

    Energy Technology Data Exchange (ETDEWEB)

    Durocher, Au., E-mail: aurelien.durocher@cea.fr [CEA-IRFM, F-13108 Saint Paul-Lez-Durance (France); Bruno, V.; Chantant, M.; Gargiulo, L. [CEA-IRFM, F-13108 Saint Paul-Lez-Durance (France); Gherman, T. [Floralis UJF Filiale, F-38610 Gières (France); Hatchressian, J.-C.; Houry, M.; Le, R.; Mouyon, D. [CEA-IRFM, F-13108 Saint Paul-Lez-Durance (France)

    2013-10-15

    Highlights: ► Description of leaks issue. ► Selection of leak localization concepts. ► Qualification of leak localization concepts. -- Abstract: Fusion machine operation requires high-vacuum conditions and does not tolerate water or gas leak in the vacuum vessels, even if they are micrometric. Tore Supra, as a fully actively cooled tokamak, has got a large leak management experience; 34 water leaks occurred since the beginning of its operation in 1988. To handle this issue, after preliminary machine protection phases, the current process for leak localization is based on water or helium pressurization network by network. It generally allows the identification of a set of components where the leakage element is located. However, the unique background of CEA-IRFM laboratory points needs of accuracy and promptness out in the leak localization process. Moreover, in-vessel interventions have to be performed trying to minimize time and risks for the persons. They are linked to access conditions, radioactivity, tracer gas high pressure and vessel conditioning. Remote operation will be one of the ways to improve these points on future fusion machines. In this case, leak sensors would have to be light weight devices in order to be integrated on a carrier or to be located outside with a sniffing process set up. A leak localization program is on-going at CEA-IRFM Laboratory with the first goal of identifying and characterizing relevant concepts to localize helium or water leaks on ITER. In the same time, CEA has developed robotic carrier for effective in-vessel intervention in a hostile environment. Three major tests campaigns with the goal to identify leak sensors have been achieved on several CEA test-beds since 2010. Very promising results have been obtained: relevant scenario of leak localization performed, concepts tested in a high volume test-bed called TITAN, and, in several conditions of pressure and temperature (ultrahigh vacuum to atmospheric pressure and 20

  9. DisArticle: a web server for SVM-based discrimination of articles on traditional medicine.

    Science.gov (United States)

    Kim, Sang-Kyun; Nam, SeJin; Kim, SangHyun

    2017-01-28

    Much research has been done in Northeast Asia to show the efficacy of traditional medicine. While MEDLINE contains many biomedical articles including those on traditional medicine, it does not categorize those articles by specific research area. The aim of this study was to provide a method that searches for articles only on traditional medicine in Northeast Asia, including traditional Chinese medicine, from among the articles in MEDLINE. This research established an SVM-based classifier model to identify articles on traditional medicine. The TAK + HM classifier, trained with the features of title, abstract, keywords, herbal data, and MeSH, has a precision of 0.954 and a recall of 0.902. In particular, the feature of herbal data significantly increased the performance of the classifier. By using the TAK + HM classifier, a total of about 108,000 articles were discriminated as articles on traditional medicine from among all articles in MEDLINE. We also built a web server called DisArticle ( http://informatics.kiom.re.kr/disarticle ), in which users can search for the articles and obtain statistical data. Because much evidence-based research on traditional medicine has been published in recent years, it has become necessary to search for articles on traditional medicine exclusively in literature databases. DisArticle can help users to search for and analyze the research trends in traditional medicine.

  10. An Approach for Implementing State Machines with Online Testability

    Directory of Open Access Journals (Sweden)

    P. K. Lala

    2010-01-01

    Full Text Available During the last two decades, significant amount of research has been performed to simplify the detection of transient or soft errors in VLSI-based digital systems. This paper proposes an approach for implementing state machines that uses 2-hot code for state encoding. State machines designed using this approach allow online detection of soft errors in registers and output logic. The 2-hot code considerably reduces the number of required flip-flops and leads to relatively straightforward implementation of next state and output logic. A new way of designing output logic for online fault detection has also been presented.

  11. Indirect Tire Monitoring System - Machine Learning Approach

    Science.gov (United States)

    Svensson, O.; Thelin, S.; Byttner, S.; Fan, Y.

    2017-10-01

    The heavy vehicle industry has today no requirement to provide a tire pressure monitoring system by law. This has created issues surrounding unknown tire pressure and thread depth during active service. There is also no standardization for these kind of systems which means that different manufacturers and third party solutions work after their own principles and it can be hard to know what works for a given vehicle type. The objective is to create an indirect tire monitoring system that can generalize a method that detect both incorrect tire pressure and thread depth for different type of vehicles within a fleet without the need for additional physical sensors or vehicle specific parameters. The existing sensors that are connected communicate through CAN and are interpreted by the Drivec Bridge hardware that exist in the fleet. By using supervised machine learning a classifier was created for each axle where the main focus was the front axle which had the most issues. The classifier will classify the vehicles tires condition and will be implemented in Drivecs cloud service where it will receive its data. The resulting classifier is a random forest implemented in Python. The result from the front axle with a data set consisting of 9767 samples of buses with correct tire condition and 1909 samples of buses with incorrect tire condition it has an accuracy of 90.54% (0.96%). The data sets are created from 34 unique measurements from buses between January and May 2017. This classifier has been exported and is used inside a Node.js module created for Drivecs cloud service which is the result of the whole implementation. The developed solution is called Indirect Tire Monitoring System (ITMS) and is seen as a process. This process will predict bad classes in the cloud which will lead to warnings. The warnings are defined as incidents. They contain only the information needed and the bandwidth of the incidents are also controlled so incidents are created within an

  12. System approach to machine building enterprise innovative activity management

    Directory of Open Access Journals (Sweden)

    І.V. Levytska

    2016-12-01

    Full Text Available The company, which operates in a challenging competitive environment should focus on new products and provide innovative services that enhance their innovation to maintain the company’s market position. The article deals with the peculiarities of such an activity in the company. The authors analyze the various approaches used in the management, and supply the advantages and disadvantages of each. It is determine that the most optimal approach among them is a system approach. The definition of the consepts "a system" and "a systematic approach to innovative activity management" are suggested. The article works out the system of machine building enterprise innovative activity management, the organization of machine building enterprise innovative activity; the planning of machine building enterprise innovative activity; the control in the system of machine building enterprise innovative activity management; the elements of the control subsystem. The properties, typical for the system of innovative management, are supplied. The managers, engaged in enterprise innovative activity management, must perform a number of the suggested tasks, which affect the efficiency of the enterprise as a whole. These exact tasks are performed using the systematic approach, providing the enterprise competitive operation and quick adaptation to changes in the external environment.

  13. A Comparison of Machine Learning Approaches for Corn Yield Estimation

    Science.gov (United States)

    Kim, N.; Lee, Y. W.

    2017-12-01

    Machine learning is an efficient empirical method for classification and prediction, and it is another approach to crop yield estimation. The objective of this study is to estimate corn yield in the Midwestern United States by employing the machine learning approaches such as the support vector machine (SVM), random forest (RF), and deep neural networks (DNN), and to perform the comprehensive comparison for their results. We constructed the database using satellite images from MODIS, the climate data of PRISM climate group, and GLDAS soil moisture data. In addition, to examine the seasonal sensitivities of corn yields, two period groups were set up: May to September (MJJAS) and July and August (JA). In overall, the DNN showed the highest accuracies in term of the correlation coefficient for the two period groups. The differences between our predictions and USDA yield statistics were about 10-11 %.

  14. Tumor recognition in wireless capsule endoscopy images using textural features and SVM-based feature selection.

    Science.gov (United States)

    Li, Baopu; Meng, Max Q-H

    2012-05-01

    Tumor in digestive tract is a common disease and wireless capsule endoscopy (WCE) is a relatively new technology to examine diseases for digestive tract especially for small intestine. This paper addresses the problem of automatic recognition of tumor for WCE images. Candidate color texture feature that integrates uniform local binary pattern and wavelet is proposed to characterize WCE images. The proposed features are invariant to illumination change and describe multiresolution characteristics of WCE images. Two feature selection approaches based on support vector machine, sequential forward floating selection and recursive feature elimination, are further employed to refine the proposed features for improving the detection accuracy. Extensive experiments validate that the proposed computer-aided diagnosis system achieves a promising tumor recognition accuracy of 92.4% in WCE images on our collected data.

  15. Edu-mining: A Machine Learning Approach

    Science.gov (United States)

    Srimani, P. K.; Patil, Malini M.

    2011-12-01

    Mining Educational data is an emerging interdisciplinary research area that mainly deals with the development of methods to explore the data stored in educational institutions. The educational data is referred as Edu-DATA. Queries related to Edu-DATA are of practical interest as SQL approach is insufficient and needs to be focused in a different way. The paper aims at developing a technique called Edu-MINING which converts raw data coming from educational institutions using data mining techniques into useful information. The discovered knowledge will have a great impact on the educational research and practices. Edu-MINING explores Edu-DATA, discovers new knowledge and suggests useful methods to improve the quality of education with regard to teaching-learning process. This is illustrated through a case study.

  16. A Support Vector Machine Approach for Truncated Fingerprint Image Detection from Sweeping Fingerprint Sensors

    Science.gov (United States)

    Chen, Chi-Jim; Pai, Tun-Wen; Cheng, Mox

    2015-01-01

    A sweeping fingerprint sensor converts fingerprints on a row by row basis through image reconstruction techniques. However, a built fingerprint image might appear to be truncated and distorted when the finger was swept across a fingerprint sensor at a non-linear speed. If the truncated fingerprint images were enrolled as reference targets and collected by any automated fingerprint identification system (AFIS), successful prediction rates for fingerprint matching applications would be decreased significantly. In this paper, a novel and effective methodology with low time computational complexity was developed for detecting truncated fingerprints in a real time manner. Several filtering rules were implemented to validate existences of truncated fingerprints. In addition, a machine learning method of supported vector machine (SVM), based on the principle of structural risk minimization, was applied to reject pseudo truncated fingerprints containing similar characteristics of truncated ones. The experimental result has shown that an accuracy rate of 90.7% was achieved by successfully identifying truncated fingerprint images from testing images before AFIS enrollment procedures. The proposed effective and efficient methodology can be extensively applied to all existing fingerprint matching systems as a preliminary quality control prior to construction of fingerprint templates. PMID:25835186

  17. A Support Vector Machine Approach for Truncated Fingerprint Image Detection from Sweeping Fingerprint Sensors

    Directory of Open Access Journals (Sweden)

    Chi-Jim Chen

    2015-03-01

    Full Text Available A sweeping fingerprint sensor converts fingerprints on a row by row basis through image reconstruction techniques. However, a built fingerprint image might appear to be truncated and distorted when the finger was swept across a fingerprint sensor at a non-linear speed. If the truncated fingerprint images were enrolled as reference targets and collected by any automated fingerprint identification system (AFIS, successful prediction rates for fingerprint matching applications would be decreased significantly. In this paper, a novel and effective methodology with low time computational complexity was developed for detecting truncated fingerprints in a real time manner. Several filtering rules were implemented to validate existences of truncated fingerprints. In addition, a machine learning method of supported vector machine (SVM, based on the principle of structural risk minimization, was applied to reject pseudo truncated fingerprints containing similar characteristics of truncated ones. The experimental result has shown that an accuracy rate of 90.7% was achieved by successfully identifying truncated fingerprint images from testing images before AFIS enrollment procedures. The proposed effective and efficient methodology can be extensively applied to all existing fingerprint matching systems as a preliminary quality control prior to construction of fingerprint templates.

  18. [Application of optimized parameters SVM based on photoacoustic spectroscopy method in fault diagnosis of power transformer].

    Science.gov (United States)

    Zhang, Yu-xin; Cheng, Zhi-feng; Xu, Zheng-ping; Bai, Jing

    2015-01-01

    In order to solve the problems such as complex operation, consumption for the carrier gas and long test period in traditional power transformer fault diagnosis approach based on dissolved gas analysis (DGA), this paper proposes a new method which is detecting 5 types of characteristic gas content in transformer oil such as CH4, C2H2, C2H4, C2H6 and H2 based on photoacoustic Spectroscopy and C2H2/C2H4, CH4/H2, C2H4/C2H6 three-ratios data are calculated. The support vector machine model was constructed using cross validation method under five support vector machine functions and four kernel functions, heuristic algorithms were used in parameter optimization for penalty factor c and g, which to establish the best SVM model for the highest fault diagnosis accuracy and the fast computing speed. Particles swarm optimization and genetic algorithm two types of heuristic algorithms were comparative studied in this paper for accuracy and speed in optimization. The simulation result shows that SVM model composed of C-SVC, RBF kernel functions and genetic algorithm obtain 97. 5% accuracy in test sample set and 98. 333 3% accuracy in train sample set, and genetic algorithm was about two times faster than particles swarm optimization in computing speed. The methods described in this paper has many advantages such as simple operation, non-contact measurement, no consumption for the carrier gas, long test period, high stability and sensitivity, the result shows that the methods described in this paper can instead of the traditional transformer fault diagnosis by gas chromatography and meets the actual project needs in transformer fault diagnosis.

  19. A machine learning approach to the accurate prediction of monitor units for a compact proton machine.

    Science.gov (United States)

    Sun, Baozhou; Lam, Dao; Yang, Deshan; Grantham, Kevin; Zhang, Tiezhi; Mutic, Sasa; Zhao, Tianyu

    2018-05-01

    Clinical treatment planning systems for proton therapy currently do not calculate monitor units (MUs) in passive scatter proton therapy due to the complexity of the beam delivery systems. Physical phantom measurements are commonly employed to determine the field-specific output factors (OFs) but are often subject to limited machine time, measurement uncertainties and intensive labor. In this study, a machine learning-based approach was developed to predict output (cGy/MU) and derive MUs, incorporating the dependencies on gantry angle and field size for a single-room proton therapy system. The goal of this study was to develop a secondary check tool for OF measurements and eventually eliminate patient-specific OF measurements. The OFs of 1754 fields previously measured in a water phantom with calibrated ionization chambers and electrometers for patient-specific fields with various range and modulation width combinations for 23 options were included in this study. The training data sets for machine learning models in three different methods (Random Forest, XGBoost and Cubist) included 1431 (~81%) OFs. Ten-fold cross-validation was used to prevent "overfitting" and to validate each model. The remaining 323 (~19%) OFs were used to test the trained models. The difference between the measured and predicted values from machine learning models was analyzed. Model prediction accuracy was also compared with that of the semi-empirical model developed by Kooy (Phys. Med. Biol. 50, 2005). Additionally, gantry angle dependence of OFs was measured for three groups of options categorized on the selection of the second scatters. Field size dependence of OFs was investigated for the measurements with and without patient-specific apertures. All three machine learning methods showed higher accuracy than the semi-empirical model which shows considerably large discrepancy of up to 7.7% for the treatment fields with full range and full modulation width. The Cubist-based solution

  20. A control approach for plasma density in tokamak machines

    Energy Technology Data Exchange (ETDEWEB)

    Boncagni, Luca, E-mail: luca.boncagni@enea.it [EURATOM – ENEA Fusion Association, Frascati Research Center, Division of Fusion Physics, Rome, Frascati (Italy); Pucci, Daniele; Piesco, F.; Zarfati, Emanuele [Dipartimento di Ingegneria Informatica, Automatica e Gestionale ' ' Antonio Ruberti' ' , Sapienza Università di Roma (Italy); Mazzitelli, G. [EURATOM – ENEA Fusion Association, Frascati Research Center, Division of Fusion Physics, Rome, Frascati (Italy); Monaco, S. [Dipartimento di Ingegneria Informatica, Automatica e Gestionale ' ' Antonio Ruberti' ' , Sapienza Università di Roma (Italy)

    2013-10-15

    Highlights: •We show a control approach for line plasma density in tokamak. •We show a control approach for pressure in a tokamak chamber. •We show experimental results using one valve. -- Abstract: In tokamak machines, chamber pre-fill is crucial to attain plasma breakdown, while plasma density control is instrumental for several tasks such as machine protection and achievement of desired plasma performances. This paper sets the principles of a new control strategy for attaining both chamber pre-fill and plasma density regulation. Assuming that the actuation mean is a piezoelectric valve driven by a varying voltage, the proposed control laws ensure convergence to reference values of chamber pressure during pre-fill, and of plasma density during plasma discharge. Experimental results at FTU are presented to discuss weaknesses and strengths of the proposed control strategy. The whole system has been implemented by using the MARTe framework [1].

  1. English to Sanskrit Machine Translation Using Transfer Based approach

    Science.gov (United States)

    Pathak, Ganesh R.; Godse, Sachin P.

    2010-11-01

    Translation is one of the needs of global society for communicating thoughts and ideas of one country with other country. Translation is the process of interpretation of text meaning and subsequent production of equivalent text, also called as communicating same meaning (message) in another language. In this paper we gave detail information on how to convert source language text in to target language text using Transfer Based Approach for machine translation. Here we implemented English to Sanskrit machine translator using transfer based approach. English is global language used for business and communication but large amount of population in India is not using and understand the English. Sanskrit is ancient language of India most of the languages in India are derived from Sanskrit. Sanskrit can be act as an intermediate language for multilingual translation.

  2. Relevance vector machine technique for the inverse scattering problem

    International Nuclear Information System (INIS)

    Wang Fang-Fang; Zhang Ye-Rong

    2012-01-01

    A novel method based on the relevance vector machine (RVM) for the inverse scattering problem is presented in this paper. The nonlinearity and the ill-posedness inherent in this problem are simultaneously considered. The nonlinearity is embodied in the relation between the scattered field and the target property, which can be obtained through the RVM training process. Besides, rather than utilizing regularization, the ill-posed nature of the inversion is naturally accounted for because the RVM can produce a probabilistic output. Simulation results reveal that the proposed RVM-based approach can provide comparative performances in terms of accuracy, convergence, robustness, generalization, and improved performance in terms of sparse property in comparison with the support vector machine (SVM) based approach. (general)

  3. Machine learning techniques in disease forecasting: a case study on rice blast prediction

    Directory of Open Access Journals (Sweden)

    Kapoor Amar S

    2006-11-01

    Full Text Available Abstract Background Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases. Results Six significant weather variables were selected as predictor variables. Two series of models (cross-location and cross-year were developed and validated using a five-fold cross validation procedure. For cross-year models, the conventional multiple regression (REG approach achieved an average correlation coefficient (r of 0.50, which increased to 0.60 and percent mean absolute error (%MAE decreased from 65.42 to 52.24 when back-propagation neural network (BPNN was used. With generalized regression neural network (GRNN, the r increased to 0.70 and %MAE also improved to 46.30, which further increased to r = 0.77 and %MAE = 36.66 when support vector machine (SVM based method was used. Similarly, cross-location validation achieved r = 0.48, 0.56 and 0.66 using REG, BPNN and GRNN respectively, with their corresponding %MAE as 77.54, 66.11 and 58.26. The SVM-based method outperformed all the three approaches by further increasing r to 0.74 with improvement in %MAE to 44.12. Overall, this SVM-based prediction approach will open new vistas in the area of forecasting plant diseases of various crops. Conclusion Our case study demonstrated that SVM is better than existing machine learning techniques and conventional REG approaches in forecasting plant diseases. In this direction, we have also

  4. Automatic epileptic seizure detection in EEGs using MF-DFA, SVM based on cloud computing.

    Science.gov (United States)

    Zhang, Zhongnan; Wen, Tingxi; Huang, Wei; Wang, Meihong; Li, Chunfeng

    2017-01-01

    Epilepsy is a chronic disease with transient brain dysfunction that results from the sudden abnormal discharge of neurons in the brain. Since electroencephalogram (EEG) is a harmless and noninvasive detection method, it plays an important role in the detection of neurological diseases. However, the process of analyzing EEG to detect neurological diseases is often difficult because the brain electrical signals are random, non-stationary and nonlinear. In order to overcome such difficulty, this study aims to develop a new computer-aided scheme for automatic epileptic seizure detection in EEGs based on multi-fractal detrended fluctuation analysis (MF-DFA) and support vector machine (SVM). New scheme first extracts features from EEG by MF-DFA during the first stage. Then, the scheme applies a genetic algorithm (GA) to calculate parameters used in SVM and classify the training data according to the selected features using SVM. Finally, the trained SVM classifier is exploited to detect neurological diseases. The algorithm utilizes MLlib from library of SPARK and runs on cloud platform. Applying to a public dataset for experiment, the study results show that the new feature extraction method and scheme can detect signals with less features and the accuracy of the classification reached up to 99%. MF-DFA is a promising approach to extract features for analyzing EEG, because of its simple algorithm procedure and less parameters. The features obtained by MF-DFA can represent samples as well as traditional wavelet transform and Lyapunov exponents. GA can always find useful parameters for SVM with enough execution time. The results illustrate that the classification model can achieve comparable accuracy, which means that it is effective in epileptic seizure detection.

  5. Machine learning approaches: from theory to application in schizophrenia.

    Science.gov (United States)

    Veronese, Elisa; Castellani, Umberto; Peruzzo, Denis; Bellani, Marcella; Brambilla, Paolo

    2013-01-01

    In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice.

  6. Machine Learning Approaches: From Theory to Application in Schizophrenia

    Directory of Open Access Journals (Sweden)

    Elisa Veronese

    2013-01-01

    Full Text Available In recent years, machine learning approaches have been successfully applied for analysis of neuroimaging data, to help in the context of disease diagnosis. We provide, in this paper, an overview of recent support vector machine-based methods developed and applied in psychiatric neuroimaging for the investigation of schizophrenia. In particular, we focus on the algorithms implemented by our group, which have been applied to classify subjects affected by schizophrenia and healthy controls, comparing them in terms of accuracy results with other recently published studies. First we give a description of the basic terminology used in pattern recognition and machine learning. Then we separately summarize and explain each study, highlighting the main features that characterize each method. Finally, as an outcome of the comparison of the results obtained applying the described different techniques, conclusions are drawn in order to understand how much automatic classification approaches can be considered a useful tool in understanding the biological underpinnings of schizophrenia. We then conclude by discussing the main implications achievable by the application of these methods into clinical practice.

  7. Amp: A modular approach to machine learning in atomistic simulations

    Science.gov (United States)

    Khorshidi, Alireza; Peterson, Andrew A.

    2016-10-01

    Electronic structure calculations, such as those employing Kohn-Sham density functional theory or ab initio wavefunction theories, have allowed for atomistic-level understandings of a wide variety of phenomena and properties of matter at small scales. However, the computational cost of electronic structure methods drastically increases with length and time scales, which makes these methods difficult for long time-scale molecular dynamics simulations or large-sized systems. Machine-learning techniques can provide accurate potentials that can match the quality of electronic structure calculations, provided sufficient training data. These potentials can then be used to rapidly simulate large and long time-scale phenomena at similar quality to the parent electronic structure approach. Machine-learning potentials usually take a bias-free mathematical form and can be readily developed for a wide variety of systems. Electronic structure calculations have favorable properties-namely that they are noiseless and targeted training data can be produced on-demand-that make them particularly well-suited for machine learning. This paper discusses our modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package (Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems. Potentials developed through the atom-centered approach are simultaneously applicable for systems with various sizes. Interpolation can be enhanced by introducing custom descriptors of the local environment. We demonstrate this in the current work for Gaussian-type, bispectrum, and Zernike-type descriptors. Amp has an intuitive and modular structure with an interface through the python scripting language yet has parallelizable fortran components for demanding tasks; it is designed to integrate closely with the widely used Atomic Simulation Environment (ASE), which

  8. Linear SVM-Based Android Malware Detection for Reliable IoT Services

    Directory of Open Access Journals (Sweden)

    Hyo-Sik Ham

    2014-01-01

    Full Text Available Current many Internet of Things (IoT services are monitored and controlled through smartphone applications. By combining IoT with smartphones, many convenient IoT services have been provided to users. However, there are adverse underlying effects in such services including invasion of privacy and information leakage. In most cases, mobile devices have become cluttered with important personal user information as various services and contents are provided through them. Accordingly, attackers are expanding the scope of their attacks beyond the existing PC and Internet environment into mobile devices. In this paper, we apply a linear support vector machine (SVM to detect Android malware and compare the malware detection performance of SVM with that of other machine learning classifiers. Through experimental validation, we show that the SVM outperforms other machine learning classifiers.

  9. Geminivirus data warehouse: a database enriched with machine learning approaches.

    Science.gov (United States)

    Silva, Jose Cleydson F; Carvalho, Thales F M; Basso, Marcos F; Deguchi, Michihito; Pereira, Welison A; Sobrinho, Roberto R; Vidigal, Pedro M P; Brustolini, Otávio J B; Silva, Fabyano F; Dal-Bianco, Maximiller; Fontes, Renildes L F; Santos, Anésia A; Zerbini, Francisco Murilo; Cerqueira, Fabio R; Fontes, Elizabeth P B

    2017-05-05

    The Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.

  10. Space Weather in the Machine Learning Era: A Multidisciplinary Approach

    Science.gov (United States)

    Camporeale, E.; Wing, S.; Johnson, J.; Jackman, C. M.; McGranaghan, R.

    2018-01-01

    The workshop entitled Space Weather: A Multidisciplinary Approach took place at the Lorentz Center, University of Leiden, Netherlands, on 25-29 September 2017. The aim of this workshop was to bring together members of the Space Weather, Mathematics, Statistics, and Computer Science communities to address the use of advanced techniques such as Machine Learning, Information Theory, and Deep Learning, to better understand the Sun-Earth system and to improve space weather forecasting. Although individual efforts have been made toward this goal, the community consensus is that establishing interdisciplinary collaborations is the most promising strategy for fully utilizing the potential of these advanced techniques in solving Space Weather-related problems.

  11. A machine learning approach for the classification of metallic glasses

    Science.gov (United States)

    Gossett, Eric; Perim, Eric; Toher, Cormac; Lee, Dongwoo; Zhang, Haitao; Liu, Jingbei; Zhao, Shaofan; Schroers, Jan; Vlassak, Joost; Curtarolo, Stefano

    Metallic glasses possess an extensive set of mechanical properties along with plastic-like processability. As a result, they are a promising material in many industrial applications. However, the successful synthesis of novel metallic glasses requires trial and error, costing both time and resources. Therefore, we propose a high-throughput approach that combines an extensive set of experimental measurements with advanced machine learning techniques. This allows us to classify metallic glasses and predict the full phase diagrams for a given alloy system. Thus this method provides a means to identify potential glass-formers and opens up the possibility for accelerating and reducing the cost of the design of new metallic glasses.

  12. SVM-based feature extraction and classification of aflatoxin contaminated corn using fluorescence hyperspectral data

    Science.gov (United States)

    Support Vector Machine (SVM) was used in the Genetic Algorithms (GA) process to select and classify a subset of hyperspectral image bands. The method was applied to fluorescence hyperspectral data for the detection of aflatoxin contamination in Aspergillus flavus infected single corn kernels. In the...

  13. A Simple and General Approach to Determination of Self and Mutual Inductances for AC machines

    DEFF Research Database (Denmark)

    Lu, Kaiyuan; Rasmussen, Peter Omand; Ritchie, Ewen

    2011-01-01

    Modelling of AC electrical machines plays an important role in electrical engineering education related to electrical machine design and control. One of the fundamental requirements in AC machine modelling is to derive the self and mutual inductances, which could be position dependant. Theories...... developed so far for inductance determination are often associated with complicated machine magnetic field analysis, which exhibits a difficulty for most students. This paper describes a simple and general approach to the determination of self and mutual inductances of different types of AC machines. A new...... determination are given for a 3-phase, salient-pole synchronous machine, and an induction machine....

  14. Predicting DPP-IV inhibitors with machine learning approaches

    Science.gov (United States)

    Cai, Jie; Li, Chanjuan; Liu, Zhihong; Du, Jiewen; Ye, Jiming; Gu, Qiong; Xu, Jun

    2017-04-01

    Dipeptidyl peptidase IV (DPP-IV) is a promising Type 2 diabetes mellitus (T2DM) drug target. DPP-IV inhibitors prolong the action of glucagon-like peptide-1 (GLP-1) and gastric inhibitory peptide (GIP), improve glucose homeostasis without weight gain, edema, and hypoglycemia. However, the marketed DPP-IV inhibitors have adverse effects such as nasopharyngitis, headache, nausea, hypersensitivity, skin reactions and pancreatitis. Therefore, it is still expected for novel DPP-IV inhibitors with minimal adverse effects. The scaffolds of existing DPP-IV inhibitors are structurally diversified. This makes it difficult to build virtual screening models based upon the known DPP-IV inhibitor libraries using conventional QSAR approaches. In this paper, we report a new strategy to predict DPP-IV inhibitors with machine learning approaches involving naïve Bayesian (NB) and recursive partitioning (RP) methods. We built 247 machine learning models based on 1307 known DPP-IV inhibitors with optimized molecular properties and topological fingerprints as descriptors. The overall predictive accuracies of the optimized models were greater than 80%. An external test set, composed of 65 recently reported compounds, was employed to validate the optimized models. The results demonstrated that both NB and RP models have a good predictive ability based on different combinations of descriptors. Twenty "good" and twenty "bad" structural fragments for DPP-IV inhibitors can also be derived from these models for inspiring the new DPP-IV inhibitor scaffold design.

  15. Use of machine learning approaches for novel drug discovery.

    Science.gov (United States)

    Lima, Angélica Nakagawa; Philot, Eric Allison; Trossini, Gustavo Henrique Goulart; Scott, Luis Paulo Barbour; Maltarollo, Vinícius Gonçalves; Honorio, Kathia Maria

    2016-01-01

    The use of computational tools in the early stages of drug development has increased in recent decades. Machine learning (ML) approaches have been of special interest, since they can be applied in several steps of the drug discovery methodology, such as prediction of target structure, prediction of biological activity of new ligands through model construction, discovery or optimization of hits, and construction of models that predict the pharmacokinetic and toxicological (ADMET) profile of compounds. This article presents an overview on some applications of ML techniques in drug design. These techniques can be employed in ligand-based drug design (LBDD) and structure-based drug design (SBDD) studies, such as similarity searches, construction of classification and/or prediction models of biological activity, prediction of secondary structures and binding sites docking and virtual screening. Successful cases have been reported in the literature, demonstrating the efficiency of ML techniques combined with traditional approaches to study medicinal chemistry problems. Some ML techniques used in drug design are: support vector machine, random forest, decision trees and artificial neural networks. Currently, an important application of ML techniques is related to the calculation of scoring functions used in docking and virtual screening assays from a consensus, combining traditional and ML techniques in order to improve the prediction of binding sites and docking solutions.

  16. Prediction of skin sensitization potency using machine learning approaches.

    Science.gov (United States)

    Zang, Qingda; Paris, Michael; Lehmann, David M; Bell, Shannon; Kleinstreuer, Nicole; Allen, David; Matheson, Joanna; Jacobs, Abigail; Casey, Warren; Strickland, Judy

    2017-07-01

    The replacement of animal use in testing for regulatory classification of skin sensitizers is a priority for US federal agencies that use data from such testing. Machine learning models that classify substances as sensitizers or non-sensitizers without using animal data have been developed and evaluated. Because some regulatory agencies require that sensitizers be further classified into potency categories, we developed statistical models to predict skin sensitization potency for murine local lymph node assay (LLNA) and human outcomes. Input variables for our models included six physicochemical properties and data from three non-animal test methods: direct peptide reactivity assay; human cell line activation test; and KeratinoSens™ assay. Models were built to predict three potency categories using four machine learning approaches and were validated using external test sets and leave-one-out cross-validation. A one-tiered strategy modeled all three categories of response together while a two-tiered strategy modeled sensitizer/non-sensitizer responses and then classified the sensitizers as strong or weak sensitizers. The two-tiered model using the support vector machine with all assay and physicochemical data inputs provided the best performance, yielding accuracy of 88% for prediction of LLNA outcomes (120 substances) and 81% for prediction of human test outcomes (87 substances). The best one-tiered model predicted LLNA outcomes with 78% accuracy and human outcomes with 75% accuracy. By comparison, the LLNA predicts human potency categories with 69% accuracy (60 of 87 substances correctly categorized). These results suggest that computational models using non-animal methods may provide valuable information for assessing skin sensitization potency. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Predictive modelling of eutrophication in the Pozón de la Dolores lake (Northern Spain) by using an evolutionary support vector machines approach.

    Science.gov (United States)

    García-Nieto, P J; García-Gonzalo, E; Alonso Fernández, J R; Díaz Muñiz, C

    2018-03-01

    Eutrophication is a water enrichment in nutrients (mainly phosphorus) that generally leads to symptomatic changes and deterioration of water quality and all its uses in general, when the production of algae and other aquatic vegetations are increased. In this sense, eutrophication has caused a variety of impacts, such as high levels of Chlorophyll a (Chl-a). Consequently, anticipate its presence is a matter of importance to prevent future risks. The aim of this study was to obtain a predictive model able to perform an early detection of the eutrophication in water bodies such as lakes. This study presents a novel hybrid algorithm, based on support vector machines (SVM) approach in combination with the particle swarm optimization (PSO) technique, for predicting the eutrophication from biological and physical-chemical input parameters determined experimentally through sampling and subsequent analysis in a certificate laboratory. This optimization technique involves hyperparameter setting in the SVM training procedure, which significantly influences the regression accuracy. The results of the present study are twofold. In the first place, the significance of each biological and physical-chemical variables on the eutrophication is presented through the model. Secondly, a model for forecasting eutrophication is obtained with success. Indeed, regression with optimal hyperparameters was performed and coefficients of determination equal to 0.90 for the Total phosphorus estimation and 0.92 for the Chlorophyll concentration were obtained when this hybrid PSO-SVM-based model was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.

  18. A support vector machine approach for detection of microcalcifications.

    Science.gov (United States)

    El-Naqa, Issam; Yang, Yongyi; Wernick, Miles N; Galatsanos, Nikolas P; Nishikawa, Robert M

    2002-12-01

    In this paper, we investigate an approach based on support vector machines (SVMs) for detection of microcalcification (MC) clusters in digital mammograms, and propose a successive enhancement learning scheme for improved performance. SVM is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. We formulate MC detection as a supervised-learning problem and apply SVM to develop the detection algorithm. We use the SVM to detect at each location in the image whether an MC is present or not. We tested the proposed method using a database of 76 clinical mammograms containing 1120 MCs. We use free-response receiver operating characteristic curves to evaluate detection performance, and compare the proposed algorithm with several existing methods. In our experiments, the proposed SVM framework outperformed all the other methods tested. In particular, a sensitivity as high as 94% was achieved by the SVM method at an error rate of one false-positive cluster per image. The ability of SVM to out perform several well-known methods developed for the widely studied problem of MC detection suggests that SVM is a promising technique for object detection in a medical imaging application.

  19. A Cooperative Approach to Virtual Machine Based Fault Injection

    Energy Technology Data Exchange (ETDEWEB)

    Naughton III, Thomas J [ORNL; Engelmann, Christian [ORNL; Vallee, Geoffroy R [ORNL; Aderholdt, William Ferrol [ORNL; Scott, Stephen L [Tennessee Technological University (TTU)

    2017-01-01

    Resilience investigations often employ fault injection (FI) tools to study the effects of simulated errors on a target system. It is important to keep the target system under test (SUT) isolated from the controlling environment in order to maintain control of the experiement. Virtual machines (VMs) have been used to aid these investigations due to the strong isolation properties of system-level virtualization. A key challenge in fault injection tools is to gain proper insight and context about the SUT. In VM-based FI tools, this challenge of target con- text is increased due to the separation between host and guest (VM). We discuss an approach to VM-based FI that leverages virtual machine introspection (VMI) methods to gain insight into the target s context running within the VM. The key to this environment is the ability to provide basic information to the FI system that can be used to create a map of the target environment. We describe a proof- of-concept implementation and a demonstration of its use to introduce simulated soft errors into an iterative solver benchmark running in user-space of a guest VM.

  20. Advanced methods in NDE using machine learning approaches

    Science.gov (United States)

    Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

    2018-04-01

    Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability

  1. SVM-Based Dynamic Reconfiguration CPS for Manufacturing System in Industry 4.0

    Directory of Open Access Journals (Sweden)

    Hyun-Jun Shin

    2018-01-01

    Full Text Available CPS is potential application in various fields, such as medical, healthcare, energy, transportation, and defense, as well as Industry 4.0 in Germany. Although studies on the equipment aging and prediction of problem have been done by combining CPS with Industry 4.0, such studies were based on small numbers and majority of the papers focused primarily on CPS methodology. Therefore, it is necessary to study active self-protection to enable self-management functions, such as self-healing by applying CPS in shop-floor. In this paper, we have proposed modeling of shop-floor and a dynamic reconfigurable CPS scheme that can predict the occurrence of anomalies and self-protection in the model. For this purpose, SVM was used as a machine learning technology and it was possible to restrain overloading in manufacturing process. In addition, we design CPS framework based on machine learning for Industry 4.0, simulate it, and perform. Simulation results show the simulation model autonomously detects the abnormal situation and it is dynamically reconfigured through self-healing.

  2. An SVM-Based Solution for Fault Detection in Wind Turbines

    Directory of Open Access Journals (Sweden)

    Pedro Santos

    2015-03-01

    Full Text Available Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.

  3. Methodology for selection of attributes and operating conditions for SVM-Based fault locator's

    Directory of Open Access Journals (Sweden)

    Debbie Johan Arredondo Arteaga

    2017-01-01

    Full Text Available Context: Energy distribution companies must employ strategies to meet their timely and high quality service, and fault-locating techniques represent and agile alternative for restoring the electric service in the power distribution due to the size of distribution services (generally large and the usual interruptions in the service. However, these techniques are not robust enough and present some limitations in both computational cost and the mathematical description of the models they use. Method: This paper performs an analysis based on a Support Vector Machine for the evaluation of the proper conditions to adjust and validate a fault locator for distribution systems; so that it is possible to determine the minimum number of operating conditions that allow to achieve a good performance with a low computational effort. Results: We tested the proposed methodology in a prototypical distribution circuit, located in a rural area of Colombia. This circuit has a voltage of 34.5 KV and is subdivided in 20 zones. Additionally, the characteristics of the circuit allowed us to obtain a database of 630.000 records of single-phase faults and different operating conditions. As a result, we could determine that the locator showed a performance above 98% with 200 suitable selected operating conditions. Conclusions: It is possible to improve the performance of fault locators based on Support Vector Machine. Specifically, these improvements are achieved by properly selecting optimal operating conditions and attributes, since they directly affect the performance in terms of efficiency and the computational cost.

  4. An SVM-based solution for fault detection in wind turbines.

    Science.gov (United States)

    Santos, Pedro; Villa, Luisa F; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús

    2015-03-09

    Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.

  5. Extracting meaning from audio signals - a machine learning approach

    DEFF Research Database (Denmark)

    Larsen, Jan

    2007-01-01

    * Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression......* Machine learning framework for sound search * Genre classification * Music and audio separation * Wind noise suppression...

  6. COMPARISION OF FUZZY PERT APPROACHES IN MACHINE PRODUCTION PROCESS

    Directory of Open Access Journals (Sweden)

    İRFAN ERTUĞRUL

    2013-06-01

    Full Text Available In traditional PERT (Program Evaluation and Review Technique activity durations are represented as crisp numbers and assumed that they are drawn from beta distribution. However, in real life the duration of the activities are usually difficult to estimate precisely.  In order to overcome this difficulty, there are studies in the literature that combine fuzzy set theory and PERT method. In this study, two fuzzy PERT approaches proposed by different authors are employed to find the degrees of criticality of each path in the network and comparison of these two methods is also given. Furthermore, by the help of these methods the criticality of the activities in the marble machine production process of a company that manufactures machinery is determined and results are compared.

  7. Machine Learning Approaches to Increasing Value of Spaceflight Omics Databases

    Science.gov (United States)

    Gentry, Diana

    2017-01-01

    The number of spaceflight bioscience mission opportunities is too small to allow all relevant biological and environmental parameters to be experimentally identified. Simulated spaceflight experiments in ground-based facilities (GBFs), such as clinostats, are each suitable only for particular investigations -- a rotating-wall vessel may be 'simulated microgravity' for cell differentiation (hours), but not DNA repair (seconds) -- and introduce confounding stimuli, such as motor vibration and fluid shear effects. This uncertainty over which biological mechanisms respond to a given form of simulated space radiation or gravity, as well as its side effects, limits our ability to baseline spaceflight data and validate mission science. Machine learning techniques autonomously identify relevant and interdependent factors in a data set given the set of desired metrics to be evaluated: to automatically identify related studies, compare data from related studies, or determine linkages between types of data in the same study. System-of-systems (SoS) machine learning models have the ability to deal with both sparse and heterogeneous data, such as that provided by the small and diverse number of space biosciences flight missions; however, they require appropriate user-defined metrics for any given data set. Although machine learning in bioinformatics is rapidly expanding, the need to combine spaceflight/GBF mission parameters with omics data is unique. This work characterizes the basic requirements for implementing the SoS approach through the System Map (SM) technique, a composite of a dynamic Bayesian network and Gaussian mixture model, in real-world repositories such as the GeneLab Data System and Life Sciences Data Archive. The three primary steps are metadata management for experimental description using open-source ontologies, defining similarity and consistency metrics, and generating testing and validation data sets. Such approaches to spaceflight and GBF omics data may

  8. VLSI Design of SVM-Based Seizure Detection System With On-Chip Learning Capability.

    Science.gov (United States)

    Feng, Lichen; Li, Zunchao; Wang, Yuanfa

    2018-02-01

    Portable automatic seizure detection system is very convenient for epilepsy patients to carry. In order to make the system on-chip trainable with high efficiency and attain high detection accuracy, this paper presents a very large scale integration (VLSI) design based on the nonlinear support vector machine (SVM). The proposed design mainly consists of a feature extraction (FE) module and an SVM module. The FE module performs the three-level Daubechies discrete wavelet transform to fit the physiological bands of the electroencephalogram (EEG) signal and extracts the time-frequency domain features reflecting the nonstationary signal properties. The SVM module integrates the modified sequential minimal optimization algorithm with the table-driven-based Gaussian kernel to enable efficient on-chip learning. The presented design is verified on an Altera Cyclone II field-programmable gate array and tested using the two publicly available EEG datasets. Experiment results show that the designed VLSI system improves the detection accuracy and training efficiency.

  9. A Multi-Classification Method of Improved SVM-based Information Fusion for Traffic Parameters Forecasting

    Directory of Open Access Journals (Sweden)

    Hongzhuan Zhao

    2016-04-01

    Full Text Available With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS. Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.

  10. A SVM-based method for sentiment analysis in Persian language

    Science.gov (United States)

    Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana

    2013-03-01

    Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.

  11. DESIGN ANALYSIS OF ELECTRICAL MACHINES THROUGH INTEGRATED NUMERICAL APPROACH

    Directory of Open Access Journals (Sweden)

    ARAVIND C.V.

    2016-02-01

    Full Text Available An integrated design platform for the newer type of machines is presented in this work. The machine parameters are evaluated out using developed modelling tool. With the machine parameters, the machine is modelled using computer aided tool. The designed machine is brought to simulation tool to perform electromagnetic and electromechanical analysis. In the simulation, conditions setting are performed to setup the materials, meshes, rotational speed and the excitation circuit. Electromagnetic analysis is carried out to predict the behavior of the machine based on the movement of flux in the machines. Besides, electromechanical analysis is carried out to analyse the speed-torque characteristic, the current-torque characteristic and the phase angle-torque characteristic. After all the results are analysed, the designed machine is used to generate S block function that is compatible with MATLAB/SIMULINK tool for the dynamic operational characteristics. This allows the integration of existing drive system into the new machines designed in the modelling tool. An example of the machine design is presented to validate the usage of such a tool.

  12. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    Science.gov (United States)

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  13. Stationary Wavelet Transform and AdaBoost with SVM Based Pathological Brain Detection in MRI Scanning.

    Science.gov (United States)

    Nayak, Deepak Ranjan; Dash, Ratnakar; Majhi, Banshidhar

    2017-01-01

    This paper presents an automatic classification system for segregating pathological brain from normal brains in magnetic resonance imaging scanning. The proposed system employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images. Two-dimensional stationary wavelet transform is harnessed to extract features from the preprocessed images. The feature vector is constructed using the energy and entropy values, computed from the level- 2 SWT coefficients. Then, the relevant and uncorrelated features are selected using symmetric uncertainty ranking filter. Subsequently, the selected features are given input to the proposed AdaBoost with support vector machine classifier, where SVM is used as the base classifier of AdaBoost algorithm. To validate the proposed system, three standard MR image datasets, Dataset-66, Dataset-160, and Dataset- 255 have been utilized. The 5 runs of k-fold stratified cross validation results indicate the suggested scheme offers better performance than other existing schemes in terms of accuracy and number of features. The proposed system earns ideal classification over Dataset-66 and Dataset-160; whereas, for Dataset- 255, an accuracy of 99.45% is achieved. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. PRISMA database machine: A distributed, main-memory approach

    NARCIS (Netherlands)

    Schmidt, J.W.; Apers, Peter M.G.; Ceri, S.; Kersten, Martin L.; Oerlemans, Hans C.M.; Missikoff, M.

    1988-01-01

    The PRISMA project is a large-scale research effort in the design and implementation of a highly parallel machine for data and knowledge processing. The PRISMA database machine is a distributed, main-memory database management system implemented in an object-oriented language that runs on top of a

  15. Exploration of Machine Learning Approaches to Predict Pavement Performance

    Science.gov (United States)

    2018-03-23

    Machine learning (ML) techniques were used to model and predict pavement condition index (PCI) for various pavement types using a variety of input variables. The primary objective of this research was to develop and assess PCI predictive models for t...

  16. Machine learning for adaptive many-core machines a practical approach

    CERN Document Server

    Lopes, Noel

    2015-01-01

    The overwhelming data produced everyday and the increasing performance and cost requirements of applications?are transversal to a wide range of activities in society, from science to industry. In particular, the magnitude and complexity of the tasks that Machine Learning (ML) algorithms have to solve are driving the need to devise adaptive many-core machines that scale well with the volume of data, or in other words, can handle Big Data.This book gives a concise view on how to extend the applicability of well-known ML algorithms in Graphics Processing Unit (GPU) with data scalability in mind.

  17. Dual Numbers Approach in Multiaxis Machines Error Modeling

    Directory of Open Access Journals (Sweden)

    Jaroslav Hrdina

    2014-01-01

    Full Text Available Multiaxis machines error modeling is set in the context of modern differential geometry and linear algebra. We apply special classes of matrices over dual numbers and propose a generalization of such concept by means of general Weil algebras. We show that the classification of the geometric errors follows directly from the algebraic properties of the matrices over dual numbers and thus the calculus over the dual numbers is the proper tool for the methodology of multiaxis machines error modeling.

  18. Crack identification for rotating machines based on a nonlinear approach

    Science.gov (United States)

    Cavalini, A. A., Jr.; Sanches, L.; Bachschmid, N.; Steffen, V., Jr.

    2016-10-01

    In a previous contribution, a crack identification methodology based on a nonlinear approach was proposed. The technique uses external applied diagnostic forces at certain frequencies attaining combinational resonances, together with a pseudo-random optimization code, known as Differential Evolution, in order to characterize the signatures of the crack in the spectral responses of the flexible rotor. The conditions under which combinational resonances appear were determined by using the method of multiple scales. In real conditions, the breathing phenomenon arises from the stress and strain distribution on the cross-sectional area of the crack. This mechanism behavior follows the static and dynamic loads acting on the rotor. Therefore, the breathing crack can be simulated according to the Mayes' model, in which the crack transition from fully opened to fully closed is described by a cosine function. However, many contributions try to represent the crack behavior by machining a small notch on the shaft instead of the fatigue process. In this paper, the open and breathing crack models are compared regarding their dynamic behavior and the efficiency of the proposed identification technique. The additional flexibility introduced by the crack is calculated by using the linear fracture mechanics theory (LFM). The open crack model is based on LFM and the breathing crack model corresponds to the Mayes' model, which combines LFM with a given breathing mechanism. For illustration purposes, a rotor composed by a horizontal flexible shaft, two rigid discs, and two self-aligning ball bearings is used to compose a finite element model of the system. Then, numerical simulation is performed to determine the dynamic behavior of the rotor. Finally, the results of the inverse problem conveyed show that the methodology is a reliable tool that is able to estimate satisfactorily the location and depth of the crack.

  19. Detecting false positive sequence homology: a machine learning approach.

    Science.gov (United States)

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  20. A machine learning approach to understand business processes

    NARCIS (Netherlands)

    Maruster, L.

    2003-01-01

    Business processes (industries, administration, hospitals, etc.) become nowadays more and more complex and it is difficult to have a complete understanding of them. The goal of the thesis is to show that machine learning techniques can be used successfully for understanding a process on the basis of

  1. New approach for virtual machines consolidation in heterogeneous computing systems

    Czech Academy of Sciences Publication Activity Database

    Fesl, Jan; Cehák, J.; Doležalová, Marie; Janeček, J.

    2016-01-01

    Roč. 9, č. 12 (2016), s. 321-332 ISSN 1738-9968 Institutional support: RVO:60077344 Keywords : consolidation * virtual machine * distributed Subject RIV: JD - Computer Applications, Robotics http://www.sersc.org/journals/IJHIT/vol9_no12_2016/29.pdf

  2. Asynchronous machine rotor speed estimation using a tabulated numerical approach

    Science.gov (United States)

    Nguyen, Huu Phuc; De Miras, Jérôme; Charara, Ali; Eltabach, Mario; Bonnet, Stéphane

    2017-12-01

    This paper proposes a new method to estimate the rotor speed of the asynchronous machine by looking at the estimation problem as a nonlinear optimal control problem. The behavior of the nonlinear plant model is approximated off-line as a prediction map using a numerical one-step time discretization obtained from simulations. At each time-step, the speed of the induction machine is selected satisfying the dynamic fitting problem between the plant output and the predicted output, leading the system to adopt its dynamical behavior. Thanks to the limitation of the prediction horizon to a single time-step, the execution time of the algorithm can be completely bounded. It can thus easily be implemented and embedded into a real-time system to observe the speed of the real induction motor. Simulation results show the performance and robustness of the proposed estimator.

  3. Time-series prediction and applications a machine intelligence approach

    CERN Document Server

    Konar, Amit

    2017-01-01

    This book presents machine learning and type-2 fuzzy sets for the prediction of time-series with a particular focus on business forecasting applications. It also proposes new uncertainty management techniques in an economic time-series using type-2 fuzzy sets for prediction of the time-series at a given time point from its preceding value in fluctuating business environments. It employs machine learning to determine repetitively occurring similar structural patterns in the time-series and uses stochastic automaton to predict the most probabilistic structure at a given partition of the time-series. Such predictions help in determining probabilistic moves in a stock index time-series Primarily written for graduate students and researchers in computer science, the book is equally useful for researchers/professionals in business intelligence and stock index prediction. A background of undergraduate level mathematics is presumed, although not mandatory, for most of the sections. Exercises with tips are provided at...

  4. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

    OpenAIRE

    Shi, Xingjian; Chen, Zhourong; Wang, Hao; Yeung, Dit-Yan; Wong, Wai-kin; Woo, Wang-chun

    2015-01-01

    The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period of time. Very few previous studies have examined this crucial and challenging weather forecasting problem from the machine learning perspective. In this paper, we formulate precipitation nowcasting as a spatiotemporal sequence forecasting problem in which both the input and the prediction target are spatiotemporal sequences. By extending the fully connected LSTM (F...

  5. Machine learning approach for single molecule localisation microscopy.

    Science.gov (United States)

    Colabrese, Silvia; Castello, Marco; Vicidomini, Giuseppe; Del Bue, Alessio

    2018-04-01

    Single molecule localisation (SML) microscopy is a fundamental tool for biological discoveries; it provides sub-diffraction spatial resolution images by detecting and localizing "all" the fluorescent molecules labeling the structure of interest. For this reason, the effective resolution of SML microscopy strictly depends on the algorithm used to detect and localize the single molecules from the series of microscopy frames. To adapt to the different imaging conditions that can occur in a SML experiment, all current localisation algorithms request, from the microscopy users, the choice of different parameters. This choice is not always easy and their wrong selection can lead to poor performance. Here we overcome this weakness with the use of machine learning. We propose a parameter-free pipeline for SML learning based on support vector machine (SVM). This strategy requires a short supervised training that consists in selecting by the user few fluorescent molecules (∼ 10-20) from the frames under analysis. The algorithm has been extensively tested on both synthetic and real acquisitions. Results are qualitatively and quantitatively consistent with the state of the art in SML microscopy and demonstrate that the introduction of machine learning can lead to a new class of algorithms competitive and conceived from the user point of view.

  6. Support Vector Machines for decision support in electricity markets׳ strategic bidding

    DEFF Research Database (Denmark)

    Pinto, Tiago; Sousa, Tiago M.; Praça, Isabel

    2015-01-01

    . The ALBidS system allows MASCEM market negotiating players to take the best possible advantages from the market context. This paper presents the application of a Support Vector Machines (SVM) based approach to provide decision support to electricity market players. This strategy is tested and validated...... by being included in ALBidS and then compared with the application of an Artificial Neural Network (ANN), originating promising results: an effective electricity market price forecast in a fast execution time. The proposed approach is tested and validated using real electricity markets data from MIBEL......׳ research group has developed a multi-agent system: Multi-Agent System for Competitive Electricity Markets (MASCEM), which simulates the electricity markets environment. MASCEM is integrated with Adaptive Learning Strategic Bidding System (ALBidS) that works as a decision support system for market players...

  7. Finding New Perovskite Halides via Machine learning

    Directory of Open Access Journals (Sweden)

    Ghanshyam ePilania

    2016-04-01

    Full Text Available Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning via building a support vector machine (SVM based classifier that uses elemental features (or descriptors to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  8. Finding New Perovskite Halides via Machine learning

    Science.gov (United States)

    Pilania, Ghanshyam; Balachandran, Prasanna V.; Kim, Chiho; Lookman, Turab

    2016-04-01

    Advanced materials with improved properties have the potential to fuel future technological advancements. However, identification and discovery of these optimal materials for a specific application is a non-trivial task, because of the vastness of the chemical search space with enormous compositional and configurational degrees of freedom. Materials informatics provides an efficient approach towards rational design of new materials, via learning from known data to make decisions on new and previously unexplored compounds in an accelerated manner. Here, we demonstrate the power and utility of such statistical learning (or machine learning) via building a support vector machine (SVM) based classifier that uses elemental features (or descriptors) to predict the formability of a given ABX3 halide composition (where A and B represent monovalent and divalent cations, respectively, and X is F, Cl, Br or I anion) in the perovskite crystal structure. The classification model is built by learning from a dataset of 181 experimentally known ABX3 compounds. After exploring a wide range of features, we identify ionic radii, tolerance factor and octahedral factor to be the most important factors for the classification, suggesting that steric and geometric packing effects govern the stability of these halides. The trained and validated models then predict, with a high degree of confidence, several novel ABX3 compositions with perovskite crystal structure.

  9. A Machine Learning Approach to Test Data Generation

    DEFF Research Database (Denmark)

    Christiansen, Henning; Dahmcke, Christina Mackeprang

    2007-01-01

    been tested, and a more thorough statistical foundation is required. We propose to use logic-statistical modelling methods for machine-learning for analyzing existing and manually marked up data, integrated with the generation of new, artificial data. More specifically, we suggest to use the PRISM...... system developed by Sato and Kameya. Based on logic programming extended with random variables and parameter learning, PRISM appears as a powerful modelling environment, which subsumes HMMs and a wide range of other methods, all embedded in a declarative language. We illustrate these principles here...

  10. Structural transformation of the machine-building enterprises: management approaches

    OpenAIRE

    Obydiennova, T.

    2014-01-01

    The article considers the «classical» approach to the issues of enterprise management: process, system, situational, concrete, functional, resource, factorial approach. Outlines the definitions of each of the approaches to the management of the enterprise, as well as advantages and disadvantages. The author suggests the schematic image of the aspects of the system approach and its application; a scheme of contents resource and functional approaches. For research on the structural transformati...

  11. Comparative Study on KNN and SVM Based Weather Classification Models for Day Ahead Short Term Solar PV Power Forecasting

    Directory of Open Access Journals (Sweden)

    Fei Wang

    2017-12-01

    Full Text Available Accurate solar photovoltaic (PV power forecasting is an essential tool for mitigating the negative effects caused by the uncertainty of PV output power in systems with high penetration levels of solar PV generation. Weather classification based modeling is an effective way to increase the accuracy of day-ahead short-term (DAST solar PV power forecasting because PV output power is strongly dependent on the specific weather conditions in a given time period. However, the accuracy of daily weather classification relies on both the applied classifiers and the training data. This paper aims to reveal how these two factors impact the classification performance and to delineate the relation between classification accuracy and sample dataset scale. Two commonly used classification methods, K-nearest neighbors (KNN and support vector machines (SVM are applied to classify the daily local weather types for DAST solar PV power forecasting using the operation data from a grid-connected PV plant in Hohhot, Inner Mongolia, China. We assessed the performance of SVM and KNN approaches, and then investigated the influences of sample scale, the number of categories, and the data distribution in different categories on the daily weather classification results. The simulation results illustrate that SVM performs well with small sample scale, while KNN is more sensitive to the length of the training dataset and can achieve higher accuracy than SVM with sufficient samples.

  12. Personalized Physical Activity Coaching: A Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Talko B. Dijkhuis

    2018-02-01

    Full Text Available Living a sedentary lifestyle is one of the major causes of numerous health problems. To encourage employees to lead a less sedentary life, the Hanze University started a health promotion program. One of the interventions in the program was the use of an activity tracker to record participants' daily step count. The daily step count served as input for a fortnightly coaching session. In this paper, we investigate the possibility of automating part of the coaching procedure on physical activity by providing personalized feedback throughout the day on a participant's progress in achieving a personal step goal. The gathered step count data was used to train eight different machine learning algorithms to make hourly estimations of the probability of achieving a personalized, daily steps threshold. In 80% of the individual cases, the Random Forest algorithm was the best performing algorithm (mean accuracy = 0.93, range = 0.88–0.99, and mean F1-score = 0.90, range = 0.87–0.94. To demonstrate the practical usefulness of these models, we developed a proof-of-concept Web application that provides personalized feedback about whether a participant is expected to reach his or her daily threshold. We argue that the use of machine learning could become an invaluable asset in the process of automated personalized coaching. The individualized algorithms allow for predicting physical activity during the day and provides the possibility to intervene in time.

  13. Effective and efficient optics inspection approach using machine learning algorithms

    International Nuclear Information System (INIS)

    Abdulla, G.; Kegelmeyer, L.; Liao, Z.; Carr, W.

    2010-01-01

    The Final Optics Damage Inspection (FODI) system automatically acquires and utilizes the Optics Inspection (OI) system to analyze images of the final optics at the National Ignition Facility (NIF). During each inspection cycle up to 1000 images acquired by FODI are examined by OI to identify and track damage sites on the optics. The process of tracking growing damage sites on the surface of an optic can be made more effective by identifying and removing signals associated with debris or reflections. The manual process to filter these false sites is daunting and time consuming. In this paper we discuss the use of machine learning tools and data mining techniques to help with this task. We describe the process to prepare a data set that can be used for training and identifying hardware reflections in the image data. In order to collect training data, the images are first automatically acquired and analyzed with existing software and then relevant features such as spatial, physical and luminosity measures are extracted for each site. A subset of these sites is 'truthed' or manually assigned a class to create training data. A supervised classification algorithm is used to test if the features can predict the class membership of new sites. A suite of self-configuring machine learning tools called 'Avatar Tools' is applied to classify all sites. To verify, we used 10-fold cross correlation and found the accuracy was above 99%. This substantially reduces the number of false alarms that would otherwise be sent for more extensive investigation.

  14. Process Approach for Modeling of Machine and Tractor Fleet Structure

    Science.gov (United States)

    Dokin, B. D.; Aletdinova, A. A.; Kravchenko, M. S.; Tsybina, Y. S.

    2018-05-01

    The existing software complexes on modelling of the machine and tractor fleet structure are mostly aimed at solving the task of optimization. However, the creators, choosing only one optimization criterion and incorporating it in their software, provide grounds on why it is the best without giving a decision maker the opportunity to choose it for their enterprise. To analyze “bottlenecks” of machine and tractor fleet modelling, the authors of this article created a process model, in which they included adjustment to the plan of using machinery based on searching through alternative technologies. As a result, the following recommendations for software complex development have been worked out: the introduction of a database of alternative technologies; the possibility for a user to change the timing of the operations even beyond the allowable limits and in that case the calculation of the incurred loss; the possibility to rule out the solution of an optimization task, and if there is a necessity in it - the possibility to choose an optimization criterion; introducing graphical display of an annual complex of works, which could be enough for the development and adjustment of a business strategy.

  15. Area Determination of Diabetic Foot Ulcer Images Using a Cascaded Two-Stage SVM-Based Classification.

    Science.gov (United States)

    Wang, Lei; Pedersen, Peder C; Agu, Emmanuel; Strong, Diane M; Tulu, Bengisu

    2017-09-01

    The standard chronic wound assessment method based on visual examination is potentially inaccurate and also represents a significant clinical workload. Hence, computer-based systems providing quantitative wound assessment may be valuable for accurately monitoring wound healing status, with the wound area the best suited for automated analysis. Here, we present a novel approach, using support vector machines (SVM) to determine the wound boundaries on foot ulcer images captured with an image capture box, which provides controlled lighting and range. After superpixel segmentation, a cascaded two-stage classifier operates as follows: in the first stage, a set of k binary SVM classifiers are trained and applied to different subsets of the entire training images dataset, and incorrectly classified instances are collected. In the second stage, another binary SVM classifier is trained on the incorrectly classified set. We extracted various color and texture descriptors from superpixels that are used as input for each stage in the classifier training. Specifically, color and bag-of-word representations of local dense scale invariant feature transformation features are descriptors for ruling out irrelevant regions, and color and wavelet-based features are descriptors for distinguishing healthy tissue from wound regions. Finally, the detected wound boundary is refined by applying the conditional random field method. We have implemented the wound classification on a Nexus 5 smartphone platform, except for training which was done offline. Results are compared with other classifiers and show that our approach provides high global performance rates (average sensitivity = 73.3%, specificity = 94.6%) and is sufficiently efficient for a smartphone-based image analysis.

  16. A General and Intuitive Approach to Understand and Compare the Torque Production Capability of AC Machines

    DEFF Research Database (Denmark)

    Wang, Dong; Lu, Kaiyuan; Rasmussen, Peter Omand

    2014-01-01

    Electromagnetic torque analysis is one of the key issues in the analysis of electric machines. It plays an important role in machine design and control. The common method described in most of the textbooks is to calculate the torque in the machine variables and then transform them to the dq......-frame, through complicated mathematical manipulations. This is a more mathematical approach rather than explaining the physics behind torque production, which even brings a lot of difficulties to specialist. This paper introduces a general and intuitive approach to obtain the dq-frame torque equation of various...... AC machines. In this method, torque equation can be obtained based on the intuitive physical understanding of the mechanism behind torque production. It is then approved to be applicable for general case, including rotor saliency and various types of magnetomotive force sources. As an application...

  17. Application of Machine Learning Approaches for Protein-protein Interactions Prediction.

    Science.gov (United States)

    Zhang, Mengying; Su, Qiang; Lu, Yi; Zhao, Manman; Niu, Bing

    2017-01-01

    Proteomics endeavors to study the structures, functions and interactions of proteins. Information of the protein-protein interactions (PPIs) helps to improve our knowledge of the functions and the 3D structures of proteins. Thus determining the PPIs is essential for the study of the proteomics. In this review, in order to study the application of machine learning in predicting PPI, some machine learning approaches such as support vector machine (SVM), artificial neural networks (ANNs) and random forest (RF) were selected, and the examples of its applications in PPIs were listed. SVM and RF are two commonly used methods. Nowadays, more researchers predict PPIs by combining more than two methods. This review presents the application of machine learning approaches in predicting PPI. Many examples of success in identification and prediction in the area of PPI prediction have been discussed, and the PPIs research is still in progress. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Rule based systems for big data a machine learning approach

    CERN Document Server

    Liu, Han; Cocea, Mihaela

    2016-01-01

    The ideas introduced in this book explore the relationships among rule based systems, machine learning and big data. Rule based systems are seen as a special type of expert systems, which can be built by using expert knowledge or learning from real data. The book focuses on the development and evaluation of rule based systems in terms of accuracy, efficiency and interpretability. In particular, a unified framework for building rule based systems, which consists of the operations of rule generation, rule simplification and rule representation, is presented. Each of these operations is detailed using specific methods or techniques. In addition, this book also presents some ensemble learning frameworks for building ensemble rule based systems.

  19. Indonesian name matching using machine learning supervised approach

    Science.gov (United States)

    Alifikri, Mohamad; Arif Bijaksana, Moch.

    2018-03-01

    Most existing name matching methods are developed for English language and so they cover the characteristics of this language. Up to this moment, there is no specific one has been designed and implemented for Indonesian names. The purpose of this thesis is to develop Indonesian name matching dataset as a contribution to academic research and to propose suitable feature set by utilizing combination of context of name strings and its permute-winkler score. Machine learning classification algorithms is taken as the method for performing name matching. Based on the experiments, by using tuned Random Forest algorithm and proposed features, there is an improvement of matching performance by approximately 1.7% and it is able to reduce until 70% misclassification result of the state of the arts methods. This improving performance makes the matching system more effective and reduces the risk of misclassified matches.

  20. A machine learning approach to quantifying noise in medical images

    Science.gov (United States)

    Chowdhury, Aritra; Sevinsky, Christopher J.; Yener, Bülent; Aggour, Kareem S.; Gustafson, Steven M.

    2016-03-01

    As advances in medical imaging technology are resulting in significant growth of biomedical image data, new techniques are needed to automate the process of identifying images of low quality. Automation is needed because it is very time consuming for a domain expert such as a medical practitioner or a biologist to manually separate good images from bad ones. While there are plenty of de-noising algorithms in the literature, their focus is on designing filters which are necessary but not sufficient for determining how useful an image is to a domain expert. Thus a computational tool is needed to assign a score to each image based on its perceived quality. In this paper, we introduce a machine learning-based score and call it the Quality of Image (QoI) score. The QoI score is computed by combining the confidence values of two popular classification techniques—support vector machines (SVMs) and Naïve Bayes classifiers. We test our technique on clinical image data obtained from cancerous tissue samples. We used 747 tissue samples that are stained by four different markers (abbreviated as CK15, pck26, E_cad and Vimentin) leading to a total of 2,988 images. The results show that images can be classified as good (high QoI), bad (low QoI) or ugly (intermediate QoI) based on their QoI scores. Our automated labeling is in agreement with the domain experts with a bi-modal classification accuracy of 94%, on average. Furthermore, ugly images can be recovered and forwarded for further post-processing.

  1. A rule-based approach to model checking of UML state machines

    Science.gov (United States)

    Grobelna, Iwona; Grobelny, Michał; Stefanowicz, Łukasz

    2016-12-01

    In the paper a new approach to formal verification of control process specification expressed by means of UML state machines in version 2.x is proposed. In contrast to other approaches from the literature, we use the abstract and universal rule-based logical model suitable both for model checking (using the nuXmv model checker), but also for logical synthesis in form of rapid prototyping. Hence, a prototype implementation in hardware description language VHDL can be obtained that fully reflects the primary, already formally verified specification in form of UML state machines. Presented approach allows to increase the assurance that implemented system meets the user-defined requirements.

  2. How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach.

    Science.gov (United States)

    Ichikawa, Daisuke; Saito, Toki; Ujita, Waka; Oyama, Hiroshi

    2016-12-01

    Our purpose was to develop a new machine-learning approach (a virtual health check-up) toward identification of those at high risk of hyperuricemia. Applying the system to general health check-ups is expected to reduce medical costs compared with administering an additional test. Data were collected during annual health check-ups performed in Japan between 2011 and 2013 (inclusive). We prepared training and test datasets from the health check-up data to build prediction models; these were composed of 43,524 and 17,789 persons, respectively. Gradient-boosting decision tree (GBDT), random forest (RF), and logistic regression (LR) approaches were trained using the training dataset and were then used to predict hyperuricemia in the test dataset. Undersampling was applied to build the prediction models to deal with the imbalanced class dataset. The results showed that the RF and GBDT approaches afforded the best performances in terms of sensitivity and specificity, respectively. The area under the curve (AUC) values of the models, which reflected the total discriminative ability of the classification, were 0.796 [95% confidence interval (CI): 0.766-0.825] for the GBDT, 0.784 [95% CI: 0.752-0.815] for the RF, and 0.785 [95% CI: 0.752-0.819] for the LR approaches. No significant differences were observed between pairs of each approach. Small changes occurred in the AUCs after applying undersampling to build the models. We developed a virtual health check-up that predicted the development of hyperuricemia using machine-learning methods. The GBDT, RF, and LR methods had similar predictive capability. Undersampling did not remarkably improve predictive power. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Prediction of Skin Sensitization Potency Using Machine Learning Approaches

    Science.gov (United States)

    Replacing animal tests currently used for regulatory hazard classification of skin sensitizers is one of ICCVAM’s top priorities. Accordingly, U.S. federal agency scientists are developing and evaluating computational approaches to classify substances as sensitizers or nons...

  4. Predicting Refractive Surgery Outcome: Machine Learning Approach With Big Data.

    Science.gov (United States)

    Achiron, Asaf; Gur, Zvi; Aviv, Uri; Hilely, Assaf; Mimouni, Michael; Karmona, Lily; Rokach, Lior; Kaiserman, Igor

    2017-09-01

    To develop a decision forest for prediction of laser refractive surgery outcome. Data from consecutive cases of patients who underwent LASIK or photorefractive surgeries during a 12-year period in a single center were assembled into a single dataset. Training of machine-learning classifiers and testing were performed with a statistical classifier algorithm. The decision forest was created by feature vectors extracted from 17,592 cases and 38 clinical parameters for each patient. A 10-fold cross-validation procedure was applied to estimate the predictive value of the decision forest when applied to new patients. Analysis included patients younger than 40 years who were not treated for monovision. Efficacy of 0.7 or greater and 0.8 or greater was achieved in 16,198 (92.0%) and 14,945 (84.9%) eyes, respectively. Efficacy of less than 0.4 and less than 0.5 was achieved in 322 (1.8%) and 506 (2.9%) eyes, respectively. Patients in the low efficacy group (differences compared with the high efficacy group (≥ 0.8), yet were clinically similar (mean differences between groups of 0.7 years, of 0.43 mm in pupil size, of 0.11 D in cylinder, of 0.22 logMAR in preoperative CDVA, of 0.11 mm in optical zone size, of 1.03 D in actual sphere treatment, and of 0.64 D in actual cylinder treatment). The preoperative subjective CDVA had the highest gain (most important to the model). Correlations analysis revealed significantly decreased efficacy with increased age (r = -0.67, P big data from refractive surgeries may be of interest. [J Refract Surg. 2017;33(9):592-597.]. Copyright 2017, SLACK Incorporated.

  5. A machine learning approach for predicting the relationship between energy resources and economic development

    Science.gov (United States)

    Cogoljević, Dušan; Alizamir, Meysam; Piljan, Ivan; Piljan, Tatjana; Prljić, Katarina; Zimonjić, Stefan

    2018-04-01

    The linkage between energy resources and economic development is a topic of great interest. Research in this area is also motivated by contemporary concerns about global climate change, carbon emissions fluctuating crude oil prices, and the security of energy supply. The purpose of this research is to develop and apply the machine learning approach to predict gross domestic product (GDP) based on the mix of energy resources. Our results indicate that GDP predictive accuracy can be improved slightly by applying a machine learning approach.

  6. Machine Learning Approaches for Predicting Human Skin Sensitization Hazard

    Science.gov (United States)

    One of ICCVAM’s top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary for a substance to elicit a skin sensitization reaction suggests that no single in chemico, in vit...

  7. MODELS OF LIVE MIGRATION WITH ITERATIVE APPROACH AND MOVE OF VIRTUAL MACHINES

    Directory of Open Access Journals (Sweden)

    S. M. Aleksankov

    2015-11-01

    Full Text Available Subject of Research. The processes of live migration without shared storage with pre-copy approach and move migration are researched. Migration of virtual machines is an important opportunity of virtualization technology. It enables applications to move transparently with their runtime environments between physical machines. Live migration becomes noticeable technology for efficient load balancing and optimizing the deployment of virtual machines to physical hosts in data centres. Before the advent of live migration, only network migration (the so-called, «Move», has been used, that entails stopping the virtual machine execution while copying to another physical server, and, consequently, unavailability of the service. Method. Algorithms of live migration without shared storage with pre-copy approach and move migration of virtual machines are reviewed from the perspective of research of migration time and unavailability of services at migrating of virtual machines. Main Results. Analytical models are proposed predicting migration time of virtual machines and unavailability of services at migrating with such technologies as live migration with pre-copy approach without shared storage and move migration. In the latest works on the time assessment of unavailability of services and migration time using live migration without shared storage experimental results are described, that are applicable to draw general conclusions about the changes of time for unavailability of services and migration time, but not to predict their values. Practical Significance. The proposed models can be used for predicting the migration time and time of unavailability of services, for example, at implementation of preventive and emergency works on the physical nodes in data centres.

  8. Opinion Mining in Latvian Text Using Semantic Polarity Analysis and Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Gatis Špats

    2016-07-01

    Full Text Available In this paper we demonstrate approaches for opinion mining in Latvian text. Authors have applied, combined and extended results of several previous studies and public resources to perform opinion mining in Latvian text using two approaches, namely, semantic polarity analysis and machine learning. One of the most significant constraints that make application of opinion mining for written content classification in Latvian text challenging is the limited publicly available text corpora for classifier training. We have joined several sources and created a publically available extended lexicon. Our results are comparable to or outperform current achievements in opinion mining in Latvian. Experiments show that lexicon-based methods provide more accurate opinion mining than the application of Naive Bayes machine learning classifier on Latvian tweets. Methods used during this study could be further extended using human annotators, unsupervised machine learning and bootstrapping to create larger corpora of classified text.

  9. Workforce Optimization for Bank Operation Centers: A Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Sefik Ilkin Serengil

    2017-12-01

    Full Text Available Online Banking Systems evolved and improved in recent years with the use of mobile and online technologies, performing money transfer transactions on these channels can be done without delay and human interaction, however commercial customers still tend to transfer money on bank branches due to several concerns. Bank Operation Centers serve to reduce the operational workload of branches. Centralized management also offers personalized service by appointed expert employees in these centers. Inherently, workload volume of money transfer transactions changes dramatically in hours. Therefore, work-force should be planned instantly or early to save labor force and increase operational efficiency. This paper introduces a hybrid multi stage approach for workforce planning in bank operation centers by the application of supervised and unsu-pervised learning algorithms. Expected workload would be predicted as supervised learning whereas employees are clus-tered into different skill groups as unsupervised learning to match transactions and proper employees. Finally, workforce optimization is analyzed for proposed approach on production data.

  10. A New Approach to Spindle Radial Error Evaluation Using a Machine Vision System

    Directory of Open Access Journals (Sweden)

    Kavitha C.

    2017-03-01

    Full Text Available The spindle rotational accuracy is one of the important issues in a machine tool which affects the surface topography and dimensional accuracy of a workpiece. This paper presents a machine-vision-based approach to radial error measurement of a lathe spindle using a CMOS camera and a PC-based image processing system. In the present work, a precisely machined cylindrical master is mounted on the spindle as a datum surface and variations of its position are captured using the camera for evaluating runout of the spindle. The Circular Hough Transform (CHT is used to detect variations of the centre position of the master cylinder during spindle rotation at subpixel level from a sequence of images. Radial error values of the spindle are evaluated using the Fourier series analysis of the centre position of the master cylinder calculated with the least squares curve fitting technique. The experiments have been carried out on a lathe at different operating speeds and the spindle radial error estimation results are presented. The proposed method provides a simpler approach to on-machine estimation of the spindle radial error in machine tools.

  11. Process planning optimization on turning machine tool using a hybrid genetic algorithm with local search approach

    Directory of Open Access Journals (Sweden)

    Yuliang Su

    2015-04-01

    Full Text Available A turning machine tool is a kind of new type of machine tool that is equipped with more than one spindle and turret. The distinctive simultaneous and parallel processing abilities of turning machine tool increase the complexity of process planning. The operations would not only be sequenced and satisfy precedence constraints, but also should be scheduled with multiple objectives such as minimizing machining cost, maximizing utilization of turning machine tool, and so on. To solve this problem, a hybrid genetic algorithm was proposed to generate optimal process plans based on a mixed 0-1 integer programming model. An operation precedence graph is used to represent precedence constraints and help generate a feasible initial population of hybrid genetic algorithm. Encoding strategy based on data structure was developed to represent process plans digitally in order to form the solution space. In addition, a local search approach for optimizing the assignments of available turrets would be added to incorporate scheduling with process planning. A real-world case is used to prove that the proposed approach could avoid infeasible solutions and effectively generate a global optimal process plan.

  12. Machine learning approaches for the prediction of signal peptides and otherprotein sorting signals

    DEFF Research Database (Denmark)

    Nielsen, Henrik; Brunak, Søren; von Heijne, Gunnar

    1999-01-01

    Prediction of protein sorting signals from the sequence of amino acids has great importance in the field of proteomics today. Recently,the growth of protein databases, combined with machine learning approaches, such as neural networks and hidden Markov models, havemade it possible to achieve...

  13. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants.

    Science.gov (United States)

    Vieira, Lucas Maciel; Grativol, Clicia; Thiebaut, Flavia; Carvalho, Thais G; Hardoim, Pablo R; Hemerly, Adriana; Lifschitz, Sergio; Ferreira, Paulo Cavalcanti Gomes; Walter, Maria Emilia M T

    2017-03-04

    Non-coding RNAs (ncRNAs) constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs), which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM). We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane ( Saccharum spp.) and in maize ( Zea mays ). From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

  14. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants

    Directory of Open Access Journals (Sweden)

    Lucas Maciel Vieira

    2017-03-01

    Full Text Available Non-coding RNAs (ncRNAs constitute an important set of transcripts produced in the cells of organisms. Among them, there is a large amount of a particular class of long ncRNAs that are difficult to predict, the so-called long intergenic ncRNAs (lincRNAs, which might play essential roles in gene regulation and other cellular processes. Despite the importance of these lincRNAs, there is still a lack of biological knowledge and, currently, the few computational methods considered are so specific that they cannot be successfully applied to other species different from those that they have been originally designed to. Prediction of lncRNAs have been performed with machine learning techniques. Particularly, for lincRNA prediction, supervised learning methods have been explored in recent literature. As far as we know, there are no methods nor workflows specially designed to predict lincRNAs in plants. In this context, this work proposes a workflow to predict lincRNAs on plants, considering a workflow that includes known bioinformatics tools together with machine learning techniques, here a support vector machine (SVM. We discuss two case studies that allowed to identify novel lincRNAs, in sugarcane (Saccharum spp. and in maize (Zea mays. From the results, we also could identify differentially-expressed lincRNAs in sugarcane and maize plants submitted to pathogenic and beneficial microorganisms.

  15. New approaches in mathematical biology: Information theory and molecular machines

    International Nuclear Information System (INIS)

    Schneider, T.

    1995-01-01

    My research uses classical information theory to study genetic systems. Information theory was founded by Claude Shannon in the 1940's and has had an enormous impact on communications engineering and computer sciences. Shannon found a way to measure information. This measure can be used to precisely characterize the sequence conservation at nucleic-acid binding sites. The resulting methods, by completely replacing the use of ''consensus sequences'', provide better models for molecular biologists. An excess of conservation led us to do experimental work on bacteriophage T7 promoters and the F plasmid IncD repeats. The wonderful fidelity of telephone communications and compact disk (CD) music can be traced directly to Shannon's channel capacity theorem. When rederived for molecular biology, this theorem explains the surprising precision of many molecular events. Through connections with the Second Law of Thermodyanmics and Maxwell's Demon, this approach also has implications for the development of technology at the molecular level. Discussions of these topics are held on the internet news group bionet.info-theo. (author). (Abstract only)

  16. Facial Emotion Recognition System – A Machine Learning Approach

    Science.gov (United States)

    Ramalingam, V. V.; Pandian, A.; Jayakumar, Lavanya

    2018-04-01

    Frown is a medium for people correlation and it could be exercised in multiple real systems. Single crucial stage for frown realizing is to exactly select hysterical aspects. This journal proposed a frown realization scheme applying transformative Particle Swarm Optimization (PSO) based aspect accumulation. This entity initially employs changed LVP, handles crisscross adjacent picture element contrast, for achieving the selective first frown portrayal. Then the PSO entity inserted with a concept of micro Genetic Algorithm (mGA) called mGA-embedded PSO designed for achieving aspect accumulation. This study, the technique subsumes no disposable memory, a little-populace insignificant flock, a latest acceleration that amends with the approach and a sub dimension-based in-depth local frown aspect examines. Assistance of provincial utilization and comprehensive inspection examine structure of alleviating of an immature concurrence complication of conventional PSO. Numerous identifiers are used to diagnose different frown expositions. Stationed on extensive study within and other-sphere pictures from the continued Cohn Kanade and MMI benchmark directory appropriately. Determination of the application exceeds most advanced level PSO variants, conventional PSO, classical GA and alternate relevant frown realization structures is described with powerful limit. Extending our accession to a motion based FER application for connecting patch-based Gabor aspects with continuous data in multi-frames.

  17. Peak Detection Method Evaluation for Ion Mobility Spectrometry by Using Machine Learning Approaches

    DEFF Research Database (Denmark)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna

    2013-01-01

    machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region......-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods...

  18. A Machine Approach for Field Weakening of Permanent-Magnet Motors

    Energy Technology Data Exchange (ETDEWEB)

    Hsu, J.S.

    2000-04-02

    The commonly known technology of field weakening for permanent-magnet (PM) motors is achieved by controlling the direct-axis current component through an inverter, without using mechanical variation of the air gap, a new machine approach for field weakening of PM machines by direct control of air-gap fluxes is introduced. The demagnetization situation due to field weakening is not an issue with this new method. In fact, the PMs are strengthened at field weakening. The field-weakening ratio can reach 1O:1 or higher. This technology is particularly useful for the PM generators and electric vehicle drives.

  19. An ensemble machine learning approach to predict survival in breast cancer.

    Science.gov (United States)

    Djebbari, Amira; Liu, Ziying; Phan, Sieu; Famili, Fazel

    2008-01-01

    Current breast cancer predictive signatures are not unique. Can we use this fact to our advantage to improve prediction? From the machine learning perspective, it is well known that combining multiple classifiers can improve classification performance. We propose an ensemble machine learning approach which consists of choosing feature subsets and learning predictive models from them. We then combine models based on certain model fusion criteria and we also introduce a tuning parameter to control sensitivity. Our method significantly improves classification performance with a particular emphasis on sensitivity which is critical to avoid misclassifying poor prognosis patients as good prognosis.

  20. A Machine Learning Approach to Automated Gait Analysis for the Noldus Catwalk System.

    Science.gov (United States)

    Frohlich, Holger; Claes, Kasper; De Wolf, Catherine; Van Damme, Xavier; Michel, Anne

    2018-05-01

    Gait analysis of animal disease models can provide valuable insights into in vivo compound effects and thus help in preclinical drug development. The purpose of this paper is to establish a computational gait analysis approach for the Noldus Catwalk system, in which footprints are automatically captured and stored. We present a - to our knowledge - first machine learning based approach for the Catwalk system, which comprises a step decomposition, definition and extraction of meaningful features, multivariate step sequence alignment, feature selection, and training of different classifiers (gradient boosting machine, random forest, and elastic net). Using animal-wise leave-one-out cross validation we demonstrate that with our method we can reliable separate movement patterns of a putative Parkinson's disease animal model and several control groups. Furthermore, we show that we can predict the time point after and the type of different brain lesions and can even forecast the brain region, where the intervention was applied. We provide an in-depth analysis of the features involved into our classifiers via statistical techniques for model interpretation. A machine learning method for automated analysis of data from the Noldus Catwalk system was established. Our works shows the ability of machine learning to discriminate pharmacologically relevant animal groups based on their walking behavior in a multivariate manner. Further interesting aspects of the approach include the ability to learn from past experiments, improve with more data arriving and to make predictions for single animals in future studies.

  1. SVM-based prediction of propeptide cleavage sites in spider toxins identifies toxin innovation in an Australian tarantula.

    Directory of Open Access Journals (Sweden)

    Emily S W Wong

    Full Text Available Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree and developed an algorithm (SpiderP for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.html, a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from

  2. Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    Science.gov (United States)

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  3. A fuzzy regression with support vector machine approach to the estimation of horizontal global solar radiation

    International Nuclear Information System (INIS)

    Baser, Furkan; Demirhan, Haydar

    2017-01-01

    Accurate estimation of the amount of horizontal global solar radiation for a particular field is an important input for decision processes in solar radiation investments. In this article, we focus on the estimation of yearly mean daily horizontal global solar radiation by using an approach that utilizes fuzzy regression functions with support vector machine (FRF-SVM). This approach is not seriously affected by outlier observations and does not suffer from the over-fitting problem. To demonstrate the utility of the FRF-SVM approach in the estimation of horizontal global solar radiation, we conduct an empirical study over a dataset collected in Turkey and applied the FRF-SVM approach with several kernel functions. Then, we compare the estimation accuracy of the FRF-SVM approach to an adaptive neuro-fuzzy system and a coplot supported-genetic programming approach. We observe that the FRF-SVM approach with a Gaussian kernel function is not affected by both outliers and over-fitting problem and gives the most accurate estimates of horizontal global solar radiation among the applied approaches. Consequently, the use of hybrid fuzzy functions and support vector machine approaches is found beneficial in long-term forecasting of horizontal global solar radiation over a region with complex climatic and terrestrial characteristics. - Highlights: • A fuzzy regression functions with support vector machines approach is proposed. • The approach is robust against outlier observations and over-fitting problem. • Estimation accuracy of the model is superior to several existent alternatives. • A new solar radiation estimation model is proposed for the region of Turkey. • The model is useful under complex terrestrial and climatic conditions.

  4. Machine Learning Based Classification of Microsatellite Variation: An Effective Approach for Phylogeographic Characterization of Olive Populations.

    Science.gov (United States)

    Torkzaban, Bahareh; Kayvanjoo, Amir Hossein; Ardalan, Arman; Mousavi, Soraya; Mariotti, Roberto; Baldoni, Luciana; Ebrahimie, Esmaeil; Ebrahimi, Mansour; Hosseini-Mazinani, Mehdi

    2015-01-01

    Finding efficient analytical techniques is overwhelmingly turning into a bottleneck for the effectiveness of large biological data. Machine learning offers a novel and powerful tool to advance classification and modeling solutions in molecular biology. However, these methods have been less frequently used with empirical population genetics data. In this study, we developed a new combined approach of data analysis using microsatellite marker data from our previous studies of olive populations using machine learning algorithms. Herein, 267 olive accessions of various origins including 21 reference cultivars, 132 local ecotypes, and 37 wild olive specimens from the Iranian plateau, together with 77 of the most represented Mediterranean varieties were investigated using a finely selected panel of 11 microsatellite markers. We organized data in two '4-targeted' and '16-targeted' experiments. A strategy of assaying different machine based analyses (i.e. data cleaning, feature selection, and machine learning classification) was devised to identify the most informative loci and the most diagnostic alleles to represent the population and the geography of each olive accession. These analyses revealed microsatellite markers with the highest differentiating capacity and proved efficiency for our method of clustering olive accessions to reflect upon their regions of origin. A distinguished highlight of this study was the discovery of the best combination of markers for better differentiating of populations via machine learning models, which can be exploited to distinguish among other biological populations.

  5. A Hierarchical Method for Transient Stability Prediction of Power Systems Using the Confidence of a SVM-Based Ensemble Classifier

    Directory of Open Access Journals (Sweden)

    Yanzhen Zhou

    2016-09-01

    Full Text Available Machine learning techniques have been widely used in transient stability prediction of power systems. When using the post-fault dynamic responses, it is difficult to draw a definite conclusion about how long the duration of response data used should be in order to balance the accuracy and speed. Besides, previous studies have the problem of lacking consideration for the confidence level. To solve these problems, a hierarchical method for transient stability prediction based on the confidence of ensemble classifier using multiple support vector machines (SVMs is proposed. Firstly, multiple datasets are generated by bootstrap sampling, then features are randomly picked up to compress the datasets. Secondly, the confidence indices are defined and multiple SVMs are built based on these generated datasets. By synthesizing the probabilistic outputs of multiple SVMs, the prediction results and confidence of the ensemble classifier will be obtained. Finally, different ensemble classifiers with different response times are built to construct different layers of the proposed hierarchical scheme. The simulation results show that the proposed hierarchical method can balance the accuracy and rapidity of the transient stability prediction. Moreover, the hierarchical method can reduce the misjudgments of unstable instances and cooperate with the time domain simulation to insure the security and stability of power systems.

  6. A comparison of the stochastic and machine learning approaches in hydrologic time series forecasting

    Science.gov (United States)

    Kim, T.; Joo, K.; Seo, J.; Heo, J. H.

    2016-12-01

    Hydrologic time series forecasting is an essential task in water resources management and it becomes more difficult due to the complexity of runoff process. Traditional stochastic models such as ARIMA family has been used as a standard approach in time series modeling and forecasting of hydrological variables. Due to the nonlinearity in hydrologic time series data, machine learning approaches has been studied with the advantage of discovering relevant features in a nonlinear relation among variables. This study aims to compare the predictability between the traditional stochastic model and the machine learning approach. Seasonal ARIMA model was used as the traditional time series model, and Random Forest model which consists of decision tree and ensemble method using multiple predictor approach was applied as the machine learning approach. In the application, monthly inflow data from 1986 to 2015 of Chungju dam in South Korea were used for modeling and forecasting. In order to evaluate the performances of the used models, one step ahead and multi-step ahead forecasting was applied. Root mean squared error and mean absolute error of two models were compared.

  7. Classification of follicular lymphoma images: a holistic approach with symbol-based machine learning methods.

    Science.gov (United States)

    Zorman, Milan; Sánchez de la Rosa, José Luis; Dinevski, Dejan

    2011-12-01

    It is not very often to see a symbol-based machine learning approach to be used for the purpose of image classification and recognition. In this paper we will present such an approach, which we first used on the follicular lymphoma images. Lymphoma is a broad term encompassing a variety of cancers of the lymphatic system. Lymphoma is differentiated by the type of cell that multiplies and how the cancer presents itself. It is very important to get an exact diagnosis regarding lymphoma and to determine the treatments that will be most effective for the patient's condition. Our work was focused on the identification of lymphomas by finding follicles in microscopy images provided by the Laboratory of Pathology in the University Hospital of Tenerife, Spain. We divided our work in two stages: in the first stage we did image pre-processing and feature extraction, and in the second stage we used different symbolic machine learning approaches for pixel classification. Symbolic machine learning approaches are often neglected when looking for image analysis tools. They are not only known for a very appropriate knowledge representation, but also claimed to lack computational power. The results we got are very promising and show that symbolic approaches can be successful in image analysis applications.

  8. An Event-Triggered Machine Learning Approach for Accelerometer-Based Fall Detection.

    Science.gov (United States)

    Putra, I Putu Edy Suardiyana; Brusey, James; Gaura, Elena; Vesilo, Rein

    2017-12-22

    The fixed-size non-overlapping sliding window (FNSW) and fixed-size overlapping sliding window (FOSW) approaches are the most commonly used data-segmentation techniques in machine learning-based fall detection using accelerometer sensors. However, these techniques do not segment by fall stages (pre-impact, impact, and post-impact) and thus useful information is lost, which may reduce the detection rate of the classifier. Aligning the segment with the fall stage is difficult, as the segment size varies. We propose an event-triggered machine learning (EvenT-ML) approach that aligns each fall stage so that the characteristic features of the fall stages are more easily recognized. To evaluate our approach, two publicly accessible datasets were used. Classification and regression tree (CART), k -nearest neighbor ( k -NN), logistic regression (LR), and the support vector machine (SVM) were used to train the classifiers. EvenT-ML gives classifier F-scores of 98% for a chest-worn sensor and 92% for a waist-worn sensor, and significantly reduces the computational cost compared with the FNSW- and FOSW-based approaches, with reductions of up to 8-fold and 78-fold, respectively. EvenT-ML achieves a significantly better F-score than existing fall detection approaches. These results indicate that aligning feature segments with fall stages significantly increases the detection rate and reduces the computational cost.

  9. Machine learning approach for the outcome prediction of temporal lobe epilepsy surgery.

    Directory of Open Access Journals (Sweden)

    Rubén Armañanzas

    Full Text Available Epilepsy surgery is effective in reducing both the number and frequency of seizures, particularly in temporal lobe epilepsy (TLE. Nevertheless, a significant proportion of these patients continue suffering seizures after surgery. Here we used a machine learning approach to predict the outcome of epilepsy surgery based on supervised classification data mining taking into account not only the common clinical variables, but also pathological and neuropsychological evaluations. We have generated models capable of predicting whether a patient with TLE secondary to hippocampal sclerosis will fully recover from epilepsy or not. The machine learning analysis revealed that outcome could be predicted with an estimated accuracy of almost 90% using some clinical and neuropsychological features. Importantly, not all the features were needed to perform the prediction; some of them proved to be irrelevant to the prognosis. Personality style was found to be one of the key features to predict the outcome. Although we examined relatively few cases, findings were verified across all data, showing that the machine learning approach described in the present study may be a powerful method. Since neuropsychological assessment of epileptic patients is a standard protocol in the pre-surgical evaluation, we propose to include these specific psychological tests and machine learning tools to improve the selection of candidates for epilepsy surgery.

  10. Fractal and twin SVM-based handgrip recognition for healthy subjects and trans-radial amputees using myoelectric signal.

    Science.gov (United States)

    Arjunan, Sridhar Poosapadi; Kumar, Dinesh Kant; Jayadeva J

    2016-02-01

    Identifying functional handgrip patterns using surface electromygram (sEMG) signal recorded from amputee residual muscle is required for controlling the myoelectric prosthetic hand. In this study, we have computed the signal fractal dimension (FD) and maximum fractal length (MFL) during different grip patterns performed by healthy and transradial amputee subjects. The FD and MFL of the sEMG, referred to as the fractal features, were classified using twin support vector machines (TSVM) to recognize the handgrips. TSVM requires fewer support vectors, is suitable for data sets with unbalanced distributions, and can simultaneously be trained for improving both sensitivity and specificity. When compared with other methods, this technique resulted in improved grip recognition accuracy, sensitivity, and specificity, and this improvement was significant (κ=0.91).

  11. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    OpenAIRE

    Zhaoxuan Li; SM Mahbobur Rahman; Rolando Vega; Bing Dong

    2016-01-01

    We evaluate and compare two common methods, artificial neural networks (ANN) and support vector regression (SVR), for predicting energy productions from a solar photovoltaic (PV) system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statisti...

  12. A Review of Machine Learning and Data Mining Approaches for Business Applications in Social Networks

    OpenAIRE

    Evis Trandafili; Marenglen Biba

    2013-01-01

    Social networks have an outstanding marketing value and developing data mining methods for viral marketing is a hot topic in the research community. However, most social networks remain impossible to be fully analyzed and understood due to prohibiting sizes and the incapability of traditional machine learning and data mining approaches to deal with the new dimension in the learning process related to the large-scale environment where the data are produced. On one hand, the birth and evolution...

  13. Machine learning approaches to analysing textual injury surveillance data: a systematic review.

    Science.gov (United States)

    Vallmuur, Kirsten

    2015-06-01

    To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Systematic review. The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality

  14. A DWT and SVM based method for rolling element bearing fault diagnosis and its comparison with Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Sunil Tyagi

    2017-04-01

    Full Text Available A classification technique using Support Vector Machine (SVM classifier for detection of rolling element bearing fault is presented here.  The SVM was fed from features that were extracted from of vibration signals obtained from experimental setup consisting of rotating driveline that was mounted on rolling element bearings which were run in normal and with artificially faults induced conditions. The time-domain vibration signals were divided into 40 segments and simple features such as peaks in time domain and spectrum along with statistical features such as standard deviation, skewness, kurtosis etc. were extracted. Effectiveness of SVM classifier was compared with the performance of Artificial Neural Network (ANN classifier and it was found that the performance of SVM classifier is superior to that of ANN. The effect of pre-processing of the vibration signal by Discreet Wavelet Transform (DWT prior to feature extraction is also studied and it is shown that pre-processing of vibration signal with DWT enhances the effectiveness of both ANN and SVM classifiers. It has been demonstrated from experiment results that performance of SVM classifier is better than ANN in detection of bearing condition and pre-processing the vibration signal with DWT improves the performance of SVM classifier.

  15. a Comparison Study of Different Kernel Functions for Svm-Based Classification of Multi-Temporal Polarimetry SAR Data

    Science.gov (United States)

    Yekkehkhany, B.; Safari, A.; Homayouni, S.; Hasanlou, M.

    2014-10-01

    In this paper, a framework is developed based on Support Vector Machines (SVM) for crop classification using polarimetric features extracted from multi-temporal Synthetic Aperture Radar (SAR) imageries. The multi-temporal integration of data not only improves the overall retrieval accuracy but also provides more reliable estimates with respect to single-date data. Several kernel functions are employed and compared in this study for mapping the input space to higher Hilbert dimension space. These kernel functions include linear, polynomials and Radial Based Function (RBF). The method is applied to several UAVSAR L-band SAR images acquired over an agricultural area near Winnipeg, Manitoba, Canada. In this research, the temporal alpha features of H/A/α decomposition method are used in classification. The experimental tests show an SVM classifier with RBF kernel for three dates of data increases the Overall Accuracy (OA) to up to 3% in comparison to using linear kernel function, and up to 1% in comparison to a 3rd degree polynomial kernel function.

  16. SVM-based multisensor data fusion for phase concentration measurement in biomass-coal co-combustion

    Science.gov (United States)

    Wang, Xiaoxin; Hu, Hongli; Jia, Huiqin; Tang, Kaihao

    2018-05-01

    In this paper, the electrical method combines the electrostatic sensor and capacitance sensor to measure the phase concentration of pulverized coal/biomass/air three-phase flow through data fusion technology. In order to eliminate the effects of flow regimes and improve the accuracy of the phase concentration measurement, the mel frequency cepstrum coefficient features extracted from electrostatic signals are used to train the Continuous Gaussian Mixture Hidden Markov Model (CGHMM) for flow regime identification. Support Vector Machine (SVM) is introduced to establish the concentration information fusion model under identified flow regimes. The CGHMM models and SVM models are transplanted on digital signal processing (DSP) to realize on-line accurate measurement. The DSP flow regime identification time is 1.4 ms, and the concentration predict time is 164 μs, which can fully meet the real-time requirement. The average absolute value of the relative error of the pulverized coal is about 1.5% and that of the biomass is about 2.2%.

  17. SVM-based multimodal classification of activities of daily living in Health Smart Homes: sensors, algorithms, and first experimental results.

    Science.gov (United States)

    Fleury, Anthony; Vacher, Michel; Noury, Norbert

    2010-03-01

    By 2050, about one third of the French population will be over 65. Our laboratory's current research focuses on the monitoring of elderly people at home, to detect a loss of autonomy as early as possible. Our aim is to quantify criteria such as the international activities of daily living (ADL) or the French Autonomie Gerontologie Groupes Iso-Ressources (AGGIR) scales, by automatically classifying the different ADL performed by the subject during the day. A Health Smart Home is used for this. Our Health Smart Home includes, in a real flat, infrared presence sensors (location), door contacts (to control the use of some facilities), temperature and hygrometry sensor in the bathroom, and microphones (sound classification and speech recognition). A wearable kinematic sensor also informs postural transitions (using pattern recognition) and walk periods (frequency analysis). This data collected from the various sensors are then used to classify each temporal frame into one of the ADL that was previously acquired (seven activities: hygiene, toilet use, eating, resting, sleeping, communication, and dressing/undressing). This is done using support vector machines. We performed a 1-h experimentation with 13 young and healthy subjects to determine the models of the different activities, and then we tested the classification algorithm (cross validation) with real data.

  18. Support-Vector-Machine-Based Reduced-Order Model for Limit Cycle Oscillation Prediction of Nonlinear Aeroelastic System

    Directory of Open Access Journals (Sweden)

    Gang Chen

    2012-01-01

    Full Text Available It is not easy for the system identification-based reduced-order model (ROM and even eigenmode based reduced-order model to predict the limit cycle oscillation generated by the nonlinear unsteady aerodynamics. Most of these traditional ROMs are sensitive to the flow parameter variation. In order to deal with this problem, a support vector machine- (SVM- based ROM was investigated and the general construction framework was proposed. The two-DOF aeroelastic system for the NACA 64A010 airfoil in transonic flow was then demonstrated for the new SVM-based ROM. The simulation results show that the new ROM can capture the LCO behavior of the nonlinear aeroelastic system with good accuracy and high efficiency. The robustness and computational efficiency of the SVM-based ROM would provide a promising tool for real-time flight simulation including nonlinear aeroelastic effects.

  19. An Integrated Approach of Fuzzy Linguistic Preference Based AHP and Fuzzy COPRAS for Machine Tool Evaluation.

    Directory of Open Access Journals (Sweden)

    Huu-Tho Nguyen

    Full Text Available Globalization of business and competitiveness in manufacturing has forced companies to improve their manufacturing facilities to respond to market requirements. Machine tool evaluation involves an essential decision using imprecise and vague information, and plays a major role to improve the productivity and flexibility in manufacturing. The aim of this study is to present an integrated approach for decision-making in machine tool selection. This paper is focused on the integration of a consistent fuzzy AHP (Analytic Hierarchy Process and a fuzzy COmplex PRoportional ASsessment (COPRAS for multi-attribute decision-making in selecting the most suitable machine tool. In this method, the fuzzy linguistic reference relation is integrated into AHP to handle the imprecise and vague information, and to simplify the data collection for the pair-wise comparison matrix of the AHP which determines the weights of attributes. The output of the fuzzy AHP is imported into the fuzzy COPRAS method for ranking alternatives through the closeness coefficient. Presentation of the proposed model application is provided by a numerical example based on the collection of data by questionnaire and from the literature. The results highlight the integration of the improved fuzzy AHP and the fuzzy COPRAS as a precise tool and provide effective multi-attribute decision-making for evaluating the machine tool in the uncertain environment.

  20. Exploring prediction uncertainty of spatial data in geostatistical and machine learning Approaches

    Science.gov (United States)

    Klump, J. F.; Fouedjio, F.

    2017-12-01

    Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been intensively used for modelling spatial data. In addition to providing predictions for target variables, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. Geostatistical approaches are, by essence, adequate for providing such prediction uncertainties and their behaviour is well understood. However, they often require significant data pre-processing and rely on assumptions that are rarely met in practice. Machine learning algorithms such as random forest regression, on the other hand, require less data pre-processing and are non-parametric. This makes the application of machine learning algorithms to geostatistical problems an attractive proposition. The objective of this study is to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data. In our comparison we use both simulated and real world datasets. Apart from classical performance indicators, comparisons make use of accuracy plots, probability interval width plots, and the visual examinations of the uncertainty maps provided by the two approaches. By comparing random forest regression to kriging we found that both methods produced comparable maps of estimated values for our variables of interest. However, the measure of uncertainty provided by random forest seems to be quite different to the measure of uncertainty provided by kriging. In particular, the lack of spatial context can give misleading results in areas without ground truth data. These preliminary results raise questions about assessing the risks associated with decisions based on the predictions from geostatistical and machine learning algorithms in a spatial context, e.g. mineral exploration.

  1. Machine learning approaches to the social determinants of health in the health and retirement study.

    Science.gov (United States)

    Seligman, Benjamin; Tuljapurkar, Shripad; Rehkopf, David

    2018-04-01

    Social and economic factors are important predictors of health and of recognized importance for health systems. However, machine learning, used elsewhere in the biomedical literature, has not been extensively applied to study relationships between society and health. We investigate how machine learning may add to our understanding of social determinants of health using data from the Health and Retirement Study. A linear regression of age and gender, and a parsimonious theory-based regression additionally incorporating income, wealth, and education, were used to predict systolic blood pressure, body mass index, waist circumference, and telomere length. Prediction, fit, and interpretability were compared across four machine learning methods: linear regression, penalized regressions, random forests, and neural networks. All models had poor out-of-sample prediction. Most machine learning models performed similarly to the simpler models. However, neural networks greatly outperformed the three other methods. Neural networks also had good fit to the data ( R 2 between 0.4-0.6, versus learning models, nine variables were frequently selected or highly weighted as predictors: dental visits, current smoking, self-rated health, serial-seven subtractions, probability of receiving an inheritance, probability of leaving an inheritance of at least $10,000, number of children ever born, African-American race, and gender. Some of the machine learning methods do not improve prediction or fit beyond simpler models, however, neural networks performed well. The predictors identified across models suggest underlying social factors that are important predictors of biological indicators of chronic disease, and that the non-linear and interactive relationships between variables fundamental to the neural network approach may be important to consider.

  2. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    Science.gov (United States)

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie

  3. Identifying Green Infrastructure from Social Media and Crowdsourcing- An Image Based Machine-Learning Approach.

    Science.gov (United States)

    Rai, A.; Minsker, B. S.

    2016-12-01

    In this work we introduce a novel dataset GRID: GReen Infrastructure Detection Dataset and a framework for identifying urban green storm water infrastructure (GI) designs (wetlands/ponds, urban trees, and rain gardens/bioswales) from social media and satellite aerial images using computer vision and machine learning methods. Along with the hydrologic benefits of GI, such as reducing runoff volumes and urban heat islands, GI also provides important socio-economic benefits such as stress recovery and community cohesion. However, GI is installed by many different parties and cities typically do not know where GI is located, making study of its impacts or siting new GI difficult. We use object recognition learning methods (template matching, sliding window approach, and Random Hough Forest method) and supervised machine learning algorithms (e.g., support vector machines) as initial screening approaches to detect potential GI sites, which can then be investigated in more detail using on-site surveys. Training data were collected from GPS locations of Flickr and Instagram image postings and Amazon Mechanical Turk identification of each GI type. Sliding window method outperformed other methods and achieved an average F measure, which is combined metric for precision and recall performance measure of 0.78.

  4. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors

    Directory of Open Access Journals (Sweden)

    Roland Zemp

    2016-01-01

    Full Text Available Occupational musculoskeletal disorders, particularly chronic low back pain (LBP, are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user’s sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest. Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples. Sixteen force sensor values and the backrest angle were used as the explanatory variables (features for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  5. Application of Machine Learning Approaches for Classifying Sitting Posture Based on Force and Acceleration Sensors.

    Science.gov (United States)

    Zemp, Roland; Tanadini, Matteo; Plüss, Stefan; Schnüriger, Karin; Singh, Navrag B; Taylor, William R; Lorenzetti, Silvio

    2016-01-01

    Occupational musculoskeletal disorders, particularly chronic low back pain (LBP), are ubiquitous due to prolonged static sitting or nonergonomic sitting positions. Therefore, the aim of this study was to develop an instrumented chair with force and acceleration sensors to determine the accuracy of automatically identifying the user's sitting position by applying five different machine learning methods (Support Vector Machines, Multinomial Regression, Boosting, Neural Networks, and Random Forest). Forty-one subjects were requested to sit four times in seven different prescribed sitting positions (total 1148 samples). Sixteen force sensor values and the backrest angle were used as the explanatory variables (features) for the classification. The different classification methods were compared by means of a Leave-One-Out cross-validation approach. The best performance was achieved using the Random Forest classification algorithm, producing a mean classification accuracy of 90.9% for subjects with which the algorithm was not familiar. The classification accuracy varied between 81% and 98% for the seven different sitting positions. The present study showed the possibility of accurately classifying different sitting positions by means of the introduced instrumented office chair combined with machine learning analyses. The use of such novel approaches for the accurate assessment of chair usage could offer insights into the relationships between sitting position, sitting behaviour, and the occurrence of musculoskeletal disorders.

  6. Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach.

    Science.gov (United States)

    Ramakrishnan, Raghunathan; Dral, Pavlo O; Rupp, Matthias; von Lilienfeld, O Anatole

    2015-05-12

    Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.

  7. Two Approaches for the Management of Virtual Machines on Grid Infrastructures

    International Nuclear Information System (INIS)

    Tapiador, D.; Rubio-Montero, A. J.; Juedo, E.; Montero, R. S.; Llorente, I. M.

    2007-01-01

    Virtual machines are a promising technology to overcome some of the problems found in current Grid infrastructures, like heterogeneity, performance partitioning or application isolation. This work shows a comparison between two strategies to manage virtual machines in Globus Grids. The first alternative is a straightforward deployment that does not require additional middle ware to be installed. It is only based on standard Grid services and is not bound to a given virtualization technology. Although this option is fully functional, it is only suitable for single process batch jobs. The second solution makes use of the Virtual Workspace Service which allows a remote client to securely negotiate and manage a virtual resource. This approach better exploits the potential benefits offered by the virtualization technology and provides a wider application range. (Author)

  8. Limitations Of The Current State Space Modelling Approach In Multistage Machining Processes Due To Operation Variations

    Science.gov (United States)

    Abellán-Nebot, J. V.; Liu, J.; Romero, F.

    2009-11-01

    The State Space modelling approach has been recently proposed as an engineering-driven technique for part quality prediction in Multistage Machining Processes (MMP). Current State Space models incorporate fixture and datum variations in the multi-stage variation propagation, without explicitly considering common operation variations such as machine-tool thermal distortions, cutting-tool wear, cutting-tool deflections, etc. This paper shows the limitations of the current State Space model through an experimental case study where the effect of the spindle thermal expansion, cutting-tool flank wear and locator errors are introduced. The paper also discusses the extension of the current State Space model to include operation variations and its potential benefits.

  9. Two Machine Learning Approaches for Short-Term Wind Speed Time-Series Prediction.

    Science.gov (United States)

    Ak, Ronay; Fink, Olga; Zio, Enrico

    2016-08-01

    The increasing liberalization of European electricity markets, the growing proportion of intermittent renewable energy being fed into the energy grids, and also new challenges in the patterns of energy consumption (such as electric mobility) require flexible and intelligent power grids capable of providing efficient, reliable, economical, and sustainable energy production and distribution. From the supplier side, particularly, the integration of renewable energy sources (e.g., wind and solar) into the grid imposes an engineering and economic challenge because of the limited ability to control and dispatch these energy sources due to their intermittent characteristics. Time-series prediction of wind speed for wind power production is a particularly important and challenging task, wherein prediction intervals (PIs) are preferable results of the prediction, rather than point estimates, because they provide information on the confidence in the prediction. In this paper, two different machine learning approaches to assess PIs of time-series predictions are considered and compared: 1) multilayer perceptron neural networks trained with a multiobjective genetic algorithm and 2) extreme learning machines combined with the nearest neighbors approach. The proposed approaches are applied for short-term wind speed prediction from a real data set of hourly wind speed measurements for the region of Regina in Saskatchewan, Canada. Both approaches demonstrate good prediction precision and provide complementary advantages with respect to different evaluation criteria.

  10. A machine learning approach to galaxy-LSS classification - I. Imprints on halo merger trees

    Science.gov (United States)

    Hui, Jianan; Aragon, Miguel; Cui, Xinping; Flegal, James M.

    2018-04-01

    The cosmic web plays a major role in the formation and evolution of galaxies and defines, to a large extent, their properties. However, the relation between galaxies and environment is still not well understood. Here, we present a machine learning approach to study imprints of environmental effects on the mass assembly of haloes. We present a galaxy-LSS machine learning classifier based on galaxy properties sensitive to the environment. We then use the classifier to assess the relevance of each property. Correlations between galaxy properties and their cosmic environment can be used to predict galaxy membership to void/wall or filament/cluster with an accuracy of 93 per cent. Our study unveils environmental information encoded in properties of haloes not normally considered directly dependent on the cosmic environment such as merger history and complexity. Understanding the physical mechanism by which the cosmic web is imprinted in a halo can lead to significant improvements in galaxy formation models. This is accomplished by extracting features from galaxy properties and merger trees, computing feature scores for each feature and then applying support vector machine (SVM) to different feature sets. To this end, we have discovered that the shape and depth of the merger tree, formation time, and density of the galaxy are strongly associated with the cosmic environment. We describe a significant improvement in the original classification algorithm by performing LU decomposition of the distance matrix computed by the feature vectors and then using the output of the decomposition as input vectors for SVM.

  11. Classification of Breast Cancer Resistant Protein (BCRP) Inhibitors and Non-Inhibitors Using Machine Learning Approaches.

    Science.gov (United States)

    Belekar, Vilas; Lingineni, Karthik; Garg, Prabha

    2015-01-01

    The breast cancer resistant protein (BCRP) is an important transporter and its inhibitors play an important role in cancer treatment by improving the oral bioavailability as well as blood brain barrier (BBB) permeability of anticancer drugs. In this work, a computational model was developed to predict the compounds as BCRP inhibitors or non-inhibitors. Various machine learning approaches like, support vector machine (SVM), k-nearest neighbor (k-NN) and artificial neural network (ANN) were used to develop the models. The Matthews correlation coefficients (MCC) of developed models using ANN, k-NN and SVM are 0.67, 0.71 and 0.77, and prediction accuracies are 85.2%, 88.3% and 90.8% respectively. The developed models were tested with a test set of 99 compounds and further validated with external set of 98 compounds. Distribution plot analysis and various machine learning models were also developed based on druglikeness descriptors. Applicability domain is used to check the prediction reliability of the new molecules.

  12. A Function-Behavior-State Approach to Designing Human Machine Interface for Nuclear Power Plant Operators

    Science.gov (United States)

    Lin, Y.; Zhang, W. J.

    2005-02-01

    This paper presents an approach to human-machine interface design for control room operators of nuclear power plants. The first step in designing an interface for a particular application is to determine information content that needs to be displayed. The design methodology for this step is called the interface design framework (called framework ). Several frameworks have been proposed for applications at varying levels, including process plants. However, none is based on the design and manufacture of a plant system for which the interface is designed. This paper presents an interface design framework which originates from design theory and methodology for general technical systems. Specifically, the framework is based on a set of core concepts of a function-behavior-state model originally proposed by the artificial intelligence research community and widely applied in the design research community. Benefits of this new framework include the provision of a model-based fault diagnosis facility, and the seamless integration of the design (manufacture, maintenance) of plants and the design of human-machine interfaces. The missing linkage between design and operation of a plant was one of the causes of the Three Mile Island nuclear reactor incident. A simulated plant system is presented to explain how to apply this framework in designing an interface. The resulting human-machine interface is discussed; specifically, several fault diagnosis examples are elaborated to demonstrate how this interface could support operators' fault diagnosis in an unanticipated situation.

  13. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  14. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach

    Science.gov (United States)

    Tiun, Sabrina; AL-Dhief, Fahad Taha; Sammour, Mahmoud A. M.

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%. PMID:29672546

  15. Spoken language identification based on the enhanced self-adjusting extreme learning machine approach.

    Science.gov (United States)

    Albadr, Musatafa Abbas Abbood; Tiun, Sabrina; Al-Dhief, Fahad Taha; Sammour, Mahmoud A M

    2018-01-01

    Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.

  16. A machine-learning approach for computation of fractional flow reserve from coronary computed tomography.

    Science.gov (United States)

    Itu, Lucian; Rapaka, Saikiran; Passerini, Tiziano; Georgescu, Bogdan; Schwemmer, Chris; Schoebinger, Max; Flohr, Thomas; Sharma, Puneet; Comaniciu, Dorin

    2016-07-01

    Fractional flow reserve (FFR) is a functional index quantifying the severity of coronary artery lesions and is clinically obtained using an invasive, catheter-based measurement. Recently, physics-based models have shown great promise in being able to noninvasively estimate FFR from patient-specific anatomical information, e.g., obtained from computed tomography scans of the heart and the coronary arteries. However, these models have high computational demand, limiting their clinical adoption. In this paper, we present a machine-learning-based model for predicting FFR as an alternative to physics-based approaches. The model is trained on a large database of synthetically generated coronary anatomies, where the target values are computed using the physics-based model. The trained model predicts FFR at each point along the centerline of the coronary tree, and its performance was assessed by comparing the predictions against physics-based computations and against invasively measured FFR for 87 patients and 125 lesions in total. Correlation between machine-learning and physics-based predictions was excellent (0.9994, P machine-learning algorithm with a sensitivity of 81.6%, a specificity of 83.9%, and an accuracy of 83.2%. The correlation was 0.729 (P assessment of FFR. Average execution time went down from 196.3 ± 78.5 s for the CFD model to ∼2.4 ± 0.44 s for the machine-learning model on a workstation with 3.4-GHz Intel i7 8-core processor. Copyright © 2016 the American Physiological Society.

  17. Identifying seizure onset zone from electrocorticographic recordings: A machine learning approach based on phase locking value.

    Science.gov (United States)

    Elahian, Bahareh; Yeasin, Mohammed; Mudigoudar, Basanagoud; Wheless, James W; Babajani-Feremi, Abbas

    2017-10-01

    Using a novel technique based on phase locking value (PLV), we investigated the potential for features extracted from electrocorticographic (ECoG) recordings to serve as biomarkers to identify the seizure onset zone (SOZ). We computed the PLV between the phase of the amplitude of high gamma activity (80-150Hz) and the phase of lower frequency rhythms (4-30Hz) from ECoG recordings obtained from 10 patients with epilepsy (21 seizures). We extracted five features from the PLV and used a machine learning approach based on logistic regression to build a model that classifies electrodes as SOZ or non-SOZ. More than 96% of electrodes identified as the SOZ by our algorithm were within the resected area in six seizure-free patients. In four non-seizure-free patients, more than 31% of the identified SOZ electrodes by our algorithm were outside the resected area. In addition, we observed that the seizure outcome in non-seizure-free patients correlated with the number of non-resected SOZ electrodes identified by our algorithm. This machine learning approach, based on features extracted from the PLV, effectively identified electrodes within the SOZ. The approach has the potential to assist clinicians in surgical decision-making when pre-surgical intracranial recordings are utilized. Copyright © 2017 British Epilepsy Association. Published by Elsevier Ltd. All rights reserved.

  18. Machine Learning Approaches for Detecting Diabetic Retinopathy from Clinical and Public Health Records.

    Science.gov (United States)

    Ogunyemi, Omolola; Kermah, Dulcie

    2015-01-01

    Annual eye examinations are recommended for diabetic patients in order to detect diabetic retinopathy and other eye conditions that arise from diabetes. Medically underserved urban communities in the US have annual screening rates that are much lower than the national average and could benefit from informatics approaches to identify unscreened patients most at risk of developing retinopathy. Using clinical data from urban safety net clinics as well as public health data from the CDC's National Health and Nutrition Examination Survey, we examined different machine learning approaches for predicting retinopathy from clinical or public health data. All datasets utilized exhibited a class imbalance. Classifiers learned on the clinical data were modestly predictive of retinopathy with the best model having an AUC of 0.72, sensitivity of 69.2% and specificity of 55.9%. Classifiers learned on public health data were not predictive of retinopathy. Successful approaches to detecting latent retinopathy using machine learning could help safety net and other clinics identify unscreened patients who are most at risk of developing retinopathy and the use of ensemble classifiers on clinical data shows promise for this purpose.

  19. Tensor Voting A Perceptual Organization Approach to Computer Vision and Machine Learning

    CERN Document Server

    Mordohai, Philippos

    2006-01-01

    This lecture presents research on a general framework for perceptual organization that was conducted mainly at the Institute for Robotics and Intelligent Systems of the University of Southern California. It is not written as a historical recount of the work, since the sequence of the presentation is not in chronological order. It aims at presenting an approach to a wide range of problems in computer vision and machine learning that is data-driven, local and requires a minimal number of assumptions. The tensor voting framework combines these properties and provides a unified perceptual organiza

  20. Designing “Theory of Machines and Mechanisms” course on Project Based Learning approach

    DEFF Research Database (Denmark)

    Shinde, Vikas

    2013-01-01

    by the industry and the learning outcomes specified by the National Board of Accreditation (NBA), India; this course is restructured on Project Based Learning approach. A mini project is designed to suit course objectives. An objective of this paper is to discuss the rationale of this course design......Theory of Machines and Mechanisms course is one of the essential courses of Mechanical Engineering undergraduate curriculum practiced at Indian Institute. Previously, this course was taught by traditional instruction based pedagogy. In order to achieve profession specific skills demanded...... and the process followed to design a project which meets diverse objectives....

  1. Machine-learning approach for local classification of crystalline structures in multiphase systems

    Science.gov (United States)

    Dietz, C.; Kretz, T.; Thoma, M. H.

    2017-07-01

    Machine learning is one of the most popular fields in computer science and has a vast number of applications. In this work we will propose a method that will use a neural network to locally identify crystal structures in a mixed phase Yukawa system consisting of fcc, hcp, and bcc clusters and disordered particles similar to plasma crystals. We compare our approach to already used methods and show that the quality of identification increases significantly. The technique works very well for highly disturbed lattices and shows a flexible and robust way to classify crystalline structures that can be used by only providing particle positions. This leads to insights into highly disturbed crystalline structures.

  2. Support vector machine-based exergetic modelling of a DI diesel engine running on biodiesel–diesel blends containing expanded polystyrene

    International Nuclear Information System (INIS)

    Shamshirband, Shahaboddin; Tabatabaei, Meisam; Aghbashlo, Mortaza; Yee, Por Lip; Petković, Dalibor

    2016-01-01

    Highlights: • SVM-based thermodynamic modelling of a DI diesel engine working with diesel/biodiesel blends containing EPS. • Comparison of SVM-WT, SVM-FFA, SVM-RBF, SVM-QPSO, and ANN approaches for exergetic modelling of the engine. • Satisfactory performance of the SVM-WT for performance modelling of the engine over the other approaches. - Abstract: In the present study, four Support Vector Machine-based (SVM-based) approaches and the standard artificial neural network (ANN) model were designed and compared in modelling the exergetic parameters of a DI diesel engine running on diesel/biodiesel blends containing expanded polystyrene (EPS) wastes. For this aim, the SVM was coupled with discrete wavelet transform (SVM-WT), firefly algorithm (SVM-FFA), radial basis function (SVM-RBF) and quantum particle swarm optimization (SVM-QPSO). The exergetic data were computed using mass, energy, and exergy balance equations for the engine at different speeds and loads as well as various biodiesel and EPS wastes quantities. Three statistical indicators namely root means square error, coefficient of determination and Pearson coefficient were used to access the capability of the developed approaches for exergetic performance modelling of the DI diesel engine. The modelling results indicated that the SVM-WT approach was more efficient in exergetic modelling of the engine than the other three approaches. Moreover, the results obtained confirmed the effectiveness of the SVM-WT model in identifying the most exergy-efficient combustion conditions and the best fuel composition for achieving the most cost-effective and eco-friendly combustion process.

  3. Prediction of selective estrogen receptor beta agonist using open data and machine learning approach

    Directory of Open Access Journals (Sweden)

    Niu AQ

    2016-07-01

    Full Text Available Ai-qin Niu,1 Liang-jun Xie,2 Hui Wang,1 Bing Zhu,1 Sheng-qi Wang3 1Department of Gynecology, the First People’s Hospital of Shangqiu, Shangqiu, Henan, People’s Republic of China; 2Department of Image Diagnoses, the Third Hospital of Jinan, Jinan, Shandong, People’s Republic of China; 3Department of Mammary Disease, Guangdong Provincial Hospital of Chinese Medicine, the Second Clinical College of Guangzhou University of Chinese Medicine, Guangzhou, People’s Republic of China Background: Estrogen receptors (ERs are nuclear transcription factors that are involved in the regulation of many complex physiological processes in humans. ERs have been validated as important drug targets for the treatment of various diseases, including breast cancer, ovarian cancer, osteoporosis, and cardiovascular disease. ERs have two subtypes, ER-α and ER-β. Emerging data suggest that the development of subtype-selective ligands that specifically target ER-β could be a more optimal approach to elicit beneficial estrogen-like activities and reduce side effects. Methods: Herein, we focused on ER-β and developed its in silico quantitative structure-activity relationship models using machine learning (ML methods. Results: The chemical structures and ER-β bioactivity data were extracted from public chemogenomics databases. Four types of popular fingerprint generation methods including MACCS fingerprint, PubChem fingerprint, 2D atom pairs, and Chemistry Development Kit extended fingerprint were used as descriptors. Four ML methods including Naïve Bayesian classifier, k-nearest neighbor, random forest, and support vector machine were used to train the models. The range of classification accuracies was 77.10% to 88.34%, and the range of area under the ROC (receiver operating characteristic curve values was 0.8151 to 0.9475, evaluated by the 5-fold cross-validation. Comparison analysis suggests that both the random forest and the support vector machine are superior

  4. Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach

    Directory of Open Access Journals (Sweden)

    Shoyaib Mohammad

    2008-10-01

    Full Text Available Abstract Background Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. Results In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. Conclusion The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This

  5. Prediction of selective estrogen receptor beta agonist using open data and machine learning approach.

    Science.gov (United States)

    Niu, Ai-Qin; Xie, Liang-Jun; Wang, Hui; Zhu, Bing; Wang, Sheng-Qi

    2016-01-01

    Estrogen receptors (ERs) are nuclear transcription factors that are involved in the regulation of many complex physiological processes in humans. ERs have been validated as important drug targets for the treatment of various diseases, including breast cancer, ovarian cancer, osteoporosis, and cardiovascular disease. ERs have two subtypes, ER-α and ER-β. Emerging data suggest that the development of subtype-selective ligands that specifically target ER-β could be a more optimal approach to elicit beneficial estrogen-like activities and reduce side effects. Herein, we focused on ER-β and developed its in silico quantitative structure-activity relationship models using machine learning (ML) methods. The chemical structures and ER-β bioactivity data were extracted from public chemogenomics databases. Four types of popular fingerprint generation methods including MACCS fingerprint, PubChem fingerprint, 2D atom pairs, and Chemistry Development Kit extended fingerprint were used as descriptors. Four ML methods including Naïve Bayesian classifier, k-nearest neighbor, random forest, and support vector machine were used to train the models. The range of classification accuracies was 77.10% to 88.34%, and the range of area under the ROC (receiver operating characteristic) curve values was 0.8151 to 0.9475, evaluated by the 5-fold cross-validation. Comparison analysis suggests that both the random forest and the support vector machine are superior for the classification of selective ER-β agonists. Chemistry Development Kit extended fingerprints and MACCS fingerprint performed better in structural representation between active and inactive agonists. These results demonstrate that combining the fingerprint and ML approaches leads to robust ER-β agonist prediction models, which are potentially applicable to the identification of selective ER-β agonists.

  6. A new approach to the solution of the vacuum magnetic problem in fusion machines

    International Nuclear Information System (INIS)

    Zabeo, L.; Artaserse, G.; Cenedese, A.; Piccolo, F.; Sartori, F.

    2007-01-01

    The magnetic vacuum topology reconstruction using magnetic measurements is essential in controlling and understanding plasmas produced in magnetic confinement fusion devices. In a wide range of cases, the instruments used to approach the problem have been designed for a specific machine and to solve a specific plasma model. Recently, a new approach has been used for developing new magnetic software called FELIX. The adopted solution in the design allows the use of the software not only at JET but also at other machines. In order to reduce the analysis and debugging time the software has been designed with modularity and platform independence in mind. This results in a large portability and in particular it allows using the same code both offline and in real-time. One of the main aspects of the tool is its capability to solve different plasma models of current distribution. Thanks to this feature, in order to improve the plasma magnetic reconstruction in real-time, a set of different models has been run using FELIX. FELIX is presently running at JET in different real-time analysis and control systems that need vacuum magnetic topology

  7. Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.

    Science.gov (United States)

    Kagawa, Rina; Kawazoe, Yoshimasa; Ida, Yusuke; Shinohara, Emiko; Tanaka, Katsuya; Imai, Takeshi; Ohe, Kazuhiko

    2017-07-01

    Phenotyping is an automated technique that can be used to distinguish patients based on electronic health records. To improve the quality of medical care and advance type 2 diabetes mellitus (T2DM) research, the demand for T2DM phenotyping has been increasing. Some existing phenotyping algorithms are not sufficiently accurate for screening or identifying clinical research subjects. We propose a practical phenotyping framework using both expert knowledge and a machine learning approach to develop 2 phenotyping algorithms: one is for screening; the other is for identifying research subjects. We employ expert knowledge as rules to exclude obvious control patients and machine learning to increase accuracy for complicated patients. We developed phenotyping algorithms on the basis of our framework and performed binary classification to determine whether a patient has T2DM. To facilitate development of practical phenotyping algorithms, this study introduces new evaluation metrics: area under the precision-sensitivity curve (AUPS) with a high sensitivity and AUPS with a high positive predictive value. The proposed phenotyping algorithms based on our framework show higher performance than baseline algorithms. Our proposed framework can be used to develop 2 types of phenotyping algorithms depending on the tuning approach: one for screening, the other for identifying research subjects. We develop a novel phenotyping framework that can be easily implemented on the basis of proper evaluation metrics, which are in accordance with users' objectives. The phenotyping algorithms based on our framework are useful for extraction of T2DM patients in retrospective studies.

  8. Support vector methods for survival analysis: a comparison between ranking and regression approaches.

    Science.gov (United States)

    Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

    2011-10-01

    To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods

  9. Prediction of biochar yield from cattle manure pyrolysis via least squares support vector machine intelligent approach.

    Science.gov (United States)

    Cao, Hongliang; Xin, Ya; Yuan, Qiaoxia

    2016-02-01

    To predict conveniently the biochar yield from cattle manure pyrolysis, intelligent modeling approach was introduced in this research. A traditional artificial neural networks (ANN) model and a novel least squares support vector machine (LS-SVM) model were developed. For the identification and prediction evaluation of the models, a data set with 33 experimental data was used, which were obtained using a laboratory-scale fixed bed reaction system. The results demonstrated that the intelligent modeling approach is greatly convenient and effective for the prediction of the biochar yield. In particular, the novel LS-SVM model has a more satisfying predicting performance and its robustness is better than the traditional ANN model. The introduction and application of the LS-SVM modeling method gives a successful example, which is a good reference for the modeling study of cattle manure pyrolysis process, even other similar processes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

    Science.gov (United States)

    Wallace, Byron C; Noel-Storr, Anna; Marshall, Iain J; Cohen, Aaron M; Smalheiser, Neil R; Thomas, James

    2017-11-01

    Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  11. Prediction of Banking Systemic Risk Based on Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Shouwei Li

    2013-01-01

    Full Text Available Banking systemic risk is a complex nonlinear phenomenon and has shed light on the importance of safeguarding financial stability by recent financial crisis. According to the complex nonlinear characteristics of banking systemic risk, in this paper we apply support vector machine (SVM to the prediction of banking systemic risk in an attempt to suggest a new model with better explanatory power and stability. We conduct a case study of an SVM-based prediction model for Chinese banking systemic risk and find the experiment results showing that support vector machine is an efficient method in such case.

  12. Normal mammogram detection based on local probability difference transforms and support vector machines

    International Nuclear Information System (INIS)

    Chiracharit, W.; Kumhom, P.; Chamnongthai, K.; Sun, Y.; Delp, E.J.; Babbs, C.F

    2007-01-01

    Automatic detection of normal mammograms, as a ''first look'' for breast cancer, is a new approach to computer-aided diagnosis. This approach may be limited, however, by two main causes. The first problem is the presence of poorly separable ''crossed-distributions'' in which the correct classification depends upon the value of each feature. The second problem is overlap of the feature distributions that are extracted from digitized mammograms of normal and abnormal patients. Here we introduce a new Support Vector Machine (SVM) based method utilizing with the proposed uncrossing mapping and Local Probability Difference (LPD). Crossed-distribution feature pairs are identified and mapped into a new features that can be separated by a zero-hyperplane of the new axis. The probability density functions of the features of normal and abnormal mammograms are then sampled and the local probability difference functions are estimated to enhance the features. From 1,000 ground-truth-known mammograms, 250 normal and 250 abnormal cases, including spiculated lesions, circumscribed masses or microcalcifications, are used for training a support vector machine. The classification results tested with another 250 normal and 250 abnormal sets show improved testing performances with 90% sensitivity and 89% specificity. (author)

  13. Integrating network, sequence and functional features using machine learning approaches towards identification of novel Alzheimer genes.

    Science.gov (United States)

    Jamal, Salma; Goyal, Sukriti; Shanker, Asheesh; Grover, Abhinav

    2016-10-18

    Alzheimer's disease (AD) is a complex progressive neurodegenerative disorder commonly characterized by short term memory loss. Presently no effective therapeutic treatments exist that can completely cure this disease. The cause of Alzheimer's is still unclear, however one of the other major factors involved in AD pathogenesis are the genetic factors and around 70 % risk of the disease is assumed to be due to the large number of genes involved. Although genetic association studies have revealed a number of potential AD susceptibility genes, there still exists a need for identification of unidentified AD-associated genes and therapeutic targets to have better understanding of the disease-causing mechanisms of Alzheimer's towards development of effective AD therapeutics. In the present study, we have used machine learning approach to identify candidate AD associated genes by integrating topological properties of the genes from the protein-protein interaction networks, sequence features and functional annotations. We also used molecular docking approach and screened already known anti-Alzheimer drugs against the novel predicted probable targets of AD and observed that an investigational drug, AL-108, had high affinity for majority of the possible therapeutic targets. Furthermore, we performed molecular dynamics simulations and MM/GBSA calculations on the docked complexes to validate our preliminary findings. To the best of our knowledge, this is the first comprehensive study of its kind for identification of putative Alzheimer-associated genes using machine learning approaches and we propose that such computational studies can improve our understanding on the core etiology of AD which could lead to the development of effective anti-Alzheimer drugs.

  14. Gene prediction in metagenomic fragments: A large scale machine learning approach

    Directory of Open Access Journals (Sweden)

    Morgenstern Burkhard

    2008-04-01

    Full Text Available Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene

  15. Discovery of Intermetallic Compounds from Traditional to Machine-Learning Approaches.

    Science.gov (United States)

    Oliynyk, Anton O; Mar, Arthur

    2018-01-16

    Intermetallic compounds are bestowed by diverse compositions, complex structures, and useful properties for many materials applications. How metallic elements react to form these compounds and what structures they adopt remain challenging questions that defy predictability. Traditional approaches offer some rational strategies to prepare specific classes of intermetallics, such as targeting members within a modular homologous series, manipulating building blocks to assemble new structures, and filling interstitial sites to create stuffed variants. Because these strategies rely on precedent, they cannot foresee surprising results, by definition. Exploratory synthesis, whether through systematic phase diagram investigations or serendipity, is still essential for expanding our knowledge base. Eventually, the relationships may become too complex for the pattern recognition skills to be reliably or practically performed by humans. Complementing these traditional approaches, new machine-learning approaches may be a viable alternative for materials discovery, not only among intermetallics but also more generally to other chemical compounds. In this Account, we survey our own efforts to discover new intermetallic compounds, encompassing gallides, germanides, phosphides, arsenides, and others. We apply various machine-learning methods (such as support vector machine and random forest algorithms) to confront two significant questions in solid state chemistry. First, what crystal structures are adopted by a compound given an arbitrary composition? Initial efforts have focused on binary equiatomic phases AB, ternary equiatomic phases ABC, and full Heusler phases AB 2 C. Our analysis emphasizes the use of real experimental data and places special value on confirming predictions through experiment. Chemical descriptors are carefully chosen through a rigorous procedure called cluster resolution feature selection. Predictions for crystal structures are quantified by evaluating

  16. Fingerprint-Based Machine Learning Approach to Identify Potent and Selective 5-HT2BR Ligands

    Directory of Open Access Journals (Sweden)

    Krzysztof Rataj

    2018-05-01

    Full Text Available The identification of subtype-selective GPCR (G-protein coupled receptor ligands is a challenging task. In this study, we developed a computational protocol to find compounds with 5-HT2BR versus 5-HT1BR selectivity. Our approach employs the hierarchical combination of machine learning methods, docking, and multiple scoring methods. First, we applied machine learning tools to filter a large database of druglike compounds by the new Neighbouring Substructures Fingerprint (NSFP. This two-dimensional fingerprint contains information on the connectivity of the substructural features of a compound. Preselected subsets of the database were then subjected to docking calculations. The main indicators of compounds’ selectivity were their different interactions with the secondary binding pockets of both target proteins, while binding modes within the orthosteric binding pocket were preserved. The combined methodology of ligand-based and structure-based methods was validated prospectively, resulting in the identification of hits with nanomolar affinity and ten-fold to ten thousand-fold selectivities.

  17. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

    Directory of Open Access Journals (Sweden)

    Phuoc Tran

    2016-01-01

    Full Text Available Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

  18. Mixed Integer Linear Programming based machine learning approach identifies regulators of telomerase in yeast.

    Science.gov (United States)

    Poos, Alexandra M; Maicher, André; Dieckmann, Anna K; Oswald, Marcus; Eils, Roland; Kupiec, Martin; Luke, Brian; König, Rainer

    2016-06-02

    Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A machine learning approach for automated assessment of retinal vasculature in the oxygen induced retinopathy model.

    Science.gov (United States)

    Mazzaferri, Javier; Larrivée, Bruno; Cakir, Bertan; Sapieha, Przemyslaw; Costantino, Santiago

    2018-03-02

    Preclinical studies of vascular retinal diseases rely on the assessment of developmental dystrophies in the oxygen induced retinopathy rodent model. The quantification of vessel tufts and avascular regions is typically computed manually from flat mounted retinas imaged using fluorescent probes that highlight the vascular network. Such manual measurements are time-consuming and hampered by user variability and bias, thus a rapid and objective method is needed. Here, we introduce a machine learning approach to segment and characterize vascular tufts, delineate the whole vasculature network, and identify and analyze avascular regions. Our quantitative retinal vascular assessment (QuRVA) technique uses a simple machine learning method and morphological analysis to provide reliable computations of vascular density and pathological vascular tuft regions, devoid of user intervention within seconds. We demonstrate the high degree of error and variability of manual segmentations, and designed, coded, and implemented a set of algorithms to perform this task in a fully automated manner. We benchmark and validate the results of our analysis pipeline using the consensus of several manually curated segmentations using commonly used computer tools. The source code of our implementation is released under version 3 of the GNU General Public License ( https://www.mathworks.com/matlabcentral/fileexchange/65699-javimazzaf-qurva ).

  20. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    Science.gov (United States)

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  1. Application of heuristic and machine-learning approach to engine model calibration

    Science.gov (United States)

    Cheng, Jie; Ryu, Kwang R.; Newman, C. E.; Davis, George C.

    1993-03-01

    Automation of engine model calibration procedures is a very challenging task because (1) the calibration process searches for a goal state in a huge, continuous state space, (2) calibration is often a lengthy and frustrating task because of complicated mutual interference among the target parameters, and (3) the calibration problem is heuristic by nature, and often heuristic knowledge for constraining a search cannot be easily acquired from domain experts. A combined heuristic and machine learning approach has, therefore, been adopted to improve the efficiency of model calibration. We developed an intelligent calibration program called ICALIB. It has been used on a daily basis for engine model applications, and has reduced the time required for model calibrations from many hours to a few minutes on average. In this paper, we describe the heuristic control strategies employed in ICALIB such as a hill-climbing search based on a state distance estimation function, incremental problem solution refinement by using a dynamic tolerance window, and calibration target parameter ordering for guiding the search. In addition, we present the application of a machine learning program called GID3* for automatic acquisition of heuristic rules for ordering target parameters.

  2. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation.

    Science.gov (United States)

    Tran, Phuoc; Dinh, Dien; Nguyen, Hien T

    2016-01-01

    Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmented when translating between two languages in which spaces are not used between words, such as Chinese and Vietnamese. Since Chinese-Vietnamese is a low-resource language pair, the sparse data problem is evident in the translation system of this language pair. Therefore, while translating, whether it should be segmented or not becomes more important. In this paper, we propose a new method for translating Chinese to Vietnamese based on a combination of the advantages of character level and word level translation. In addition, a hybrid approach that combines statistics and rules is used to translate on the word level. And at the character level, a statistical translation is used. The experimental results showed that our method improved the performance of machine translation over that of character or word level translation.

  3. Geologic Carbon Sequestration Leakage Detection: A Physics-Guided Machine Learning Approach

    Science.gov (United States)

    Lin, Y.; Harp, D. R.; Chen, B.; Pawar, R.

    2017-12-01

    One of the risks of large-scale geologic carbon sequestration is the potential migration of fluids out of the storage formations. Accurate and fast detection of this fluids migration is not only important but also challenging, due to the large subsurface uncertainty and complex governing physics. Traditional leakage detection and monitoring techniques rely on geophysical observations including pressure. However, the resulting accuracy of these methods is limited because of indirect information they provide requiring expert interpretation, therefore yielding in-accurate estimates of leakage rates and locations. In this work, we develop a novel machine-learning technique based on support vector regression to effectively and efficiently predict the leakage locations and leakage rates based on limited number of pressure observations. Compared to the conventional data-driven approaches, which can be usually seem as a "black box" procedure, we develop a physics-guided machine learning method to incorporate the governing physics into the learning procedure. To validate the performance of our proposed leakage detection method, we employ our method to both 2D and 3D synthetic subsurface models. Our novel CO2 leakage detection method has shown high detection accuracy in the example problems.

  4. Characterization of Adrenal Lesions on Unenhanced MRI Using Texture Analysis: A Machine-Learning Approach.

    Science.gov (United States)

    Romeo, Valeria; Maurea, Simone; Cuocolo, Renato; Petretta, Mario; Mainenti, Pier Paolo; Verde, Francesco; Coppola, Milena; Dell'Aversana, Serena; Brunetti, Arturo

    2018-01-17

    Adrenal adenomas (AA) are the most common benign adrenal lesions, often characterized based on intralesional fat content as either lipid-rich (LRA) or lipid-poor (LPA). The differentiation of AA, particularly LPA, from nonadenoma adrenal lesions (NAL) may be challenging. Texture analysis (TA) can extract quantitative parameters from MR images. Machine learning is a technique for recognizing patterns that can be applied to medical images by identifying the best combination of TA features to create a predictive model for the diagnosis of interest. To assess the diagnostic efficacy of TA-derived parameters extracted from MR images in characterizing LRA, LPA, and NAL using a machine-learning approach. Retrospective, observational study. Sixty MR examinations, including 20 LRA, 20 LPA, and 20 NAL. Unenhanced T 1 -weighted in-phase (IP) and out-of-phase (OP) as well as T 2 -weighted (T 2 -w) MR images acquired at 3T. Adrenal lesions were manually segmented, placing a spherical volume of interest on IP, OP, and T 2 -w images. Different selection methods were trained and tested using the J48 machine-learning classifiers. The feature selection method that obtained the highest diagnostic performance using the J48 classifier was identified; the diagnostic performance was also compared with that of a senior radiologist by means of McNemar's test. A total of 138 TA-derived features were extracted; among these, four features were selected, extracted from the IP (Short_Run_High_Gray_Level_Emphasis), OP (Mean_Intensity and Maximum_3D_Diameter), and T 2 -w (Standard_Deviation) images; the J48 classifier obtained a diagnostic accuracy of 80%. The expert radiologist obtained a diagnostic accuracy of 73%. McNemar's test did not show significant differences in terms of diagnostic performance between the J48 classifier and the expert radiologist. Machine learning conducted on MR TA-derived features is a potential tool to characterize adrenal lesions. 4 Technical Efficacy: Stage 2 J

  5. Global assessment of soil organic carbon stocks and spatial distribution of histosols: the Machine Learning approach

    Science.gov (United States)

    Hengl, Tomislav

    2016-04-01

    Preliminary results of predicting distribution of soil organic soils (Histosols) and soil organic carbon stock (in tonnes per ha) using global compilations of soil profiles (about 150,000 points) and covariates at 250 m spatial resolution (about 150 covariates; mainly MODIS seasonal land products, SRTM DEM derivatives, climatic images, lithological and land cover and landform maps) are presented. We focus on using a data-driven approach i.e. Machine Learning techniques that often require no knowledge about the distribution of the target variable or knowledge about the possible relationships. Other advantages of using machine learning are (DOI: 10.1371/journal.pone.0125814): All rules required to produce outputs are formalized. The whole procedure is documented (the statistical model and associated computer script), enabling reproducible research. Predicted surfaces can make use of various information sources and can be optimized relative to all available quantitative point and covariate data. There is more flexibility in terms of the spatial extent, resolution and support of requested maps. Automated mapping is also more cost-effective: once the system is operational, maintenance and production of updates are an order of magnitude faster and cheaper. Consequently, prediction maps can be updated and improved at shorter and shorter time intervals. Some disadvantages of automated soil mapping based on Machine Learning are: Models are data-driven and any serious blunders or artifacts in the input data can propagate to order-of-magnitude larger errors than in the case of expert-based systems. Fitting machine learning models is at the order of magnitude computationally more demanding. Computing effort can be even tens of thousands higher than if e.g. linear geostatistics is used. Many machine learning models are fairly complex often abstract and any interpretation of such models is not trivial and require special multidimensional / multivariable plotting and data mining

  6. A support vector machine approach to detect financial statement fraud in South Africa: A first look

    CSIR Research Space (South Africa)

    Moepya, SO

    2014-04-01

    Full Text Available Auditors face the difficult task of detecting companies that issue manipulated financial statements. In recent years, machine learning methods have provided a feasible solution to this task. This study develops support vector machine (SVM) models...

  7. Learning Algorithms for Audio and Video Processing: Independent Component Analysis and Support Vector Machine Based Approaches

    National Research Council Canada - National Science Library

    Qi, Yuan

    2000-01-01

    In this thesis, we propose two new machine learning schemes, a subband-based Independent Component Analysis scheme and a hybrid Independent Component Analysis/Support Vector Machine scheme, and apply...

  8. The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

    Directory of Open Access Journals (Sweden)

    Reza Samizade

    2018-06-01

    Full Text Available Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.

  9. Thermo-energetic design of machine tools a systemic approach to solve the conflict between power efficiency, accuracy and productivity demonstrated at the example of machining production

    CERN Document Server

    2015-01-01

    The approach to the solution within the CRC/TR 96 financed by the German Research Foundation DFG aims at measures that will allow manufacturing accuracy to be maintained under thermally unstable conditions with increased productivity, without an additional demand for energy for tempering. The challenge of research in the CRC/TR 96 derives from the attempt to satisfy the conflicting goals of reducing energy consumption and increasing accuracy and productivity in machining. In the current research performed in 19 subprojects within the scope of the CRC/TR 96, correction and compensation solutions that influence the thermo-elastic machine tool behaviour efficiently and are oriented along the thermo-elastic functional chain are explored and implemented. As part of this general objective, the following issues must be researched and engineered in an interdisciplinary setting and brought together into useful overall solutions:   1.  Providing the modelling fundamentals to calculate the heat fluxes and the resulti...

  10. A hybrid least squares support vector machines and GMDH approach for river flow forecasting

    Science.gov (United States)

    Samsudin, R.; Saad, P.; Shabri, A.

    2010-06-01

    This paper proposes a novel hybrid forecasting model, which combines the group method of data handling (GMDH) and the least squares support vector machine (LSSVM), known as GLSSVM. The GMDH is used to determine the useful input variables for LSSVM model and the LSSVM model which works as time series forecasting. In this study the application of GLSSVM for monthly river flow forecasting of Selangor and Bernam River are investigated. The results of the proposed GLSSVM approach are compared with the conventional artificial neural network (ANN) models, Autoregressive Integrated Moving Average (ARIMA) model, GMDH and LSSVM models using the long term observations of monthly river flow discharge. The standard statistical, the root mean square error (RMSE) and coefficient of correlation (R) are employed to evaluate the performance of various models developed. Experiment result indicates that the hybrid model was powerful tools to model discharge time series and can be applied successfully in complex hydrological modeling.

  11. A SUPPORT VECTOR MACHINE APPROACH FOR DEVELOPING TELEMEDICINE SOLUTIONS: MEDICAL DIAGNOSIS

    Directory of Open Access Journals (Sweden)

    Mihaela GHEORGHE

    2015-06-01

    Full Text Available Support vector machine represents an important tool for artificial neural networks techniques including classification and prediction. It offers a solution for a wide range of different issues in which cases the traditional optimization algorithms and methods cannot be applied directly due to different constraints, including memory restrictions, hidden relationships between variables, very high volume of computations that needs to be handled. One of these issues relates to medical diagnosis, a subset of the medical field. In this paper, the SVM learning algorithm is tested on a diabetes dataset and the results obtained for training with different kernel functions are presented and analyzed in order to determine a good approach from a telemedicine perspective.

  12. A Hierarchical Approach Using Machine Learning Methods in Solar Photovoltaic Energy Production Forecasting

    Directory of Open Access Journals (Sweden)

    Zhaoxuan Li

    2016-01-01

    Full Text Available We evaluate and compare two common methods, artificial neural networks (ANN and support vector regression (SVR, for predicting energy productions from a solar photovoltaic (PV system in Florida 15 min, 1 h and 24 h ahead of time. A hierarchical approach is proposed based on the machine learning algorithms tested. The production data used in this work corresponds to 15 min averaged power measurements collected from 2014. The accuracy of the model is determined using computing error statistics such as mean bias error (MBE, mean absolute error (MAE, root mean square error (RMSE, relative MBE (rMBE, mean percentage error (MPE and relative RMSE (rRMSE. This work provides findings on how forecasts from individual inverters will improve the total solar power generation forecast of the PV system.

  13. A Hybrid dasymetric and machine learning approach to high-resolution residential electricity consumption modeling

    Energy Technology Data Exchange (ETDEWEB)

    Morton, April M [ORNL; Nagle, Nicholas N [ORNL; Piburn, Jesse O [ORNL; Stewart, Robert N [ORNL; McManamay, Ryan A [ORNL

    2017-01-01

    As urban areas continue to grow and evolve in a world of increasing environmental awareness, the need for detailed information regarding residential energy consumption patterns has become increasingly important. Though current modeling efforts mark significant progress in the effort to better understand the spatial distribution of energy consumption, the majority of techniques are highly dependent on region-specific data sources and often require building- or dwelling-level details that are not publicly available for many regions in the United States. Furthermore, many existing methods do not account for errors in input data sources and may not accurately reflect inherent uncertainties in model outputs. We propose an alternative and more general hybrid approach to high-resolution residential electricity consumption modeling by merging a dasymetric model with a complementary machine learning algorithm. The method s flexible data requirement and statistical framework ensure that the model both is applicable to a wide range of regions and considers errors in input data sources.

  14. Prediction of outcome in internet-delivered cognitive behaviour therapy for paediatric obsessive-compulsive disorder: A machine learning approach.

    Science.gov (United States)

    Lenhard, Fabian; Sauer, Sebastian; Andersson, Erik; Månsson, Kristoffer Nt; Mataix-Cols, David; Rück, Christian; Serlachius, Eva

    2018-03-01

    There are no consistent predictors of treatment outcome in paediatric obsessive-compulsive disorder (OCD). One reason for this might be the use of suboptimal statistical methodology. Machine learning is an approach to efficiently analyse complex data. Machine learning has been widely used within other fields, but has rarely been tested in the prediction of paediatric mental health treatment outcomes. To test four different machine learning methods in the prediction of treatment response in a sample of paediatric OCD patients who had received Internet-delivered cognitive behaviour therapy (ICBT). Participants were 61 adolescents (12-17 years) who enrolled in a randomized controlled trial and received ICBT. All clinical baseline variables were used to predict strictly defined treatment response status three months after ICBT. Four machine learning algorithms were implemented. For comparison, we also employed a traditional logistic regression approach. Multivariate logistic regression could not detect any significant predictors. In contrast, all four machine learning algorithms performed well in the prediction of treatment response, with 75 to 83% accuracy. The results suggest that machine learning algorithms can successfully be applied to predict paediatric OCD treatment outcome. Validation studies and studies in other disorders are warranted. Copyright © 2017 John Wiley & Sons, Ltd.

  15. A comparative analysis of machine learning approaches for plant disease identification

    Directory of Open Access Journals (Sweden)

    Hidayat ur Rahman

    2017-08-01

    Full Text Available Background: The problems to leaf in plants are very severe and they usually shorten the lifespan of plants. Leaf diseases are mainly caused due to three types of attacks including viral, bacterial or fungal. Diseased leaves reduce the crop production and affect the agricultural economy. Since agriculture plays a vital role in the economy, thus effective mechanism is required to detect the problem in early stages. Methods: Traditional approaches used for the identification of diseased plants are based on field visits which is time consuming and tedious. In this paper a comparative analysis of machine learning approaches has been presented for the identification of healthy and non-healthy plant leaves. For experimental purpose three different types of plant leaves have been selected namely, cabbage, citrus and sorghum. In order to classify healthy and non-healthy plant leaves color based features such as pixels, statistical features such as mean, standard deviation, min, max and descriptors such as Histogram of Oriented Gradients (HOG have been used. Results: 382 images of cabbage, 539 images of citrus and 262 images of sorghum were used as the primary dataset. The 40% data was utilized for testing and 60% were used for training which consisted of both healthy and damaged leaves. The results showed that random forest classifier is the best machine method for classification of healthy and diseased plant leaves. Conclusion: From the extensive experimentation it is concluded that features such as color information, statistical distribution and histogram of gradients provides sufficient clue for the classification of healthy and non-healthy plants.

  16. Downscaling of MODIS One Kilometer Evapotranspiration Using Landsat-8 Data and Machine Learning Approaches

    Directory of Open Access Journals (Sweden)

    Yinghai Ke

    2016-03-01

    Full Text Available This study presented a MODIS 8-day 1 km evapotranspiration (ET downscaling method based on Landsat 8 data (30 m and machine learning approaches. Eleven indicators including albedo, land surface temperature (LST, and vegetation indices (VIs derived from Landsat 8 data were first upscaled to 1 km resolution. Machine learning algorithms including Support Vector Regression (SVR, Cubist, and Random Forest (RF were used to model the relationship between the Landsat indicators and MODIS 8-day 1 km ET. The models were then used to predict 30 m ET based on Landsat 8 indicators. A total of thirty-two pairs of Landsat 8 images/MODIS ET data were evaluated at four study sites including two in United States and two in South Korea. Among the three models, RF produced the lowest error, with relative Root Mean Square Error (rRMSE less than 20%. Vegetation greenness related indicators such as Normalized Difference Vegetation Index (NDVI, Enhanced Vegetation Index (EVI, Soil Adjusted Vegetation Index (SAVI, and vegetation moisture related indicators such as Normalized Difference Infrared Index—Landsat 8 OLI band 7 (NDIIb7 and Normalized Difference Water Index (NDWI were the five most important features used in RF model. Temperature-based indicators were less important than vegetation greenness and moisture-related indicators because LST could have considerable variation during each 8-day period. The predicted Landsat downscaled ET had good overall agreement with MODIS ET (average rRMSE = 22% and showed a similar temporal trend as MODIS ET. Compared to the MODIS ET product, the downscaled product demonstrated more spatial details, and had better agreement with in situ ET observations (R2 = 0.56. However, we found that the accuracy of MODIS ET was the main control factor of the accuracy of the downscaled product. Improved coarse-resolution ET estimation would result in better finer-resolution estimation. This study proved the potential of using machine learning

  17. A new approach to the solution of the vacuum magnetic problem in fusion machines

    International Nuclear Information System (INIS)

    Zabeo, L.; Piccolo, F.; Sartori, F.; Albanese, R.; Cenedese, A.

    2006-01-01

    The magnetic vacuum topology reconstruction using the magnetic measurements is essential in controlling and understanding plasmas produced by fusion machines. In a wide range of the cases, the instruments to approach the problem have been designed for a specific machine and to solve a specific plasma model. Recently a new approach has been used by developing new magnetic software called Felix. The adopted solution in the design allows the use of the software not only at JET but also at different machines by simply changing a configuration file. A database describing the tokamak in the magnetic point of view is used to provide different vacuum magnetic models (polynomial, moments, filamentary) that can be solved by Felix without any recompiling or testing. In order to reduce the analysis and debugging time the software has been designed with modularity and platform independence in mind. That results in a large portability and in particular it allows use of the same code both offline and in real-time. One of the main aspects of the tool is its capability to solve different plasma models of current distribution by changing its configuration file. In order to improve the plasma magnetic reconstruction in real time a set of models has been run using Felix. An improved polynomial based model compared with the one presently used and two models using current filaments have been tested and compared. The new system has also been improved the calculation of plasma magnetic parameters. Double null configurations smooth transitions, more accurate gap and strike-point calculations, detailed boundary reconstruction are now systematically available. Felix is presently running at JET in different real-time analysis and control systems that need vacuum magnetic topology such as control of the plasma shape, the wall protection system [F.Piccolo et al.'Upgrade of the protection system for the first wall at JET in the ITER Be and W tiles prespective' this conference], the magnetic

  18. A hybrid Taguchi-artificial neural network approach to predict surface roughness during electric discharge machining of titanium alloys

    Energy Technology Data Exchange (ETDEWEB)

    Kumar, Sanjeev; Batish, Ajay [Thapar University, Patiala (India); Singh, Rupinder [GNDEC, Ludhiana (India); Singh, T. P. [Symbiosis Institute of Technology, Pune (India)

    2014-07-15

    In the present study, electric discharge machining process was used for machining of titanium alloys. Eight process parameters were varied during the process. Experimental results showed that current and pulse-on-time significantly affected the performance characteristics. Artificial neural network coupled with Taguchi approach was applied for optimization and prediction of surface roughness. The experimental results and the predicted results showed good agreement. SEM was used to investigate the surface integrity. Analysis for migration of different chemical elements and formation of compounds on the surface was performed using EDS and XRD pattern. The results showed that high discharge energy caused surface defects such as cracks, craters, thick recast layer, micro pores, pin holes, residual stresses and debris. Also, migration of chemical elements both from electrode and dielectric media were observed during EDS analysis. Presence of carbon was seen on the machined surface. XRD results showed formation of titanium carbide compound which precipitated on the machined surface.

  19. A novel artificial bee colony approach of live virtual machine migration policy using Bayes theorem.

    Science.gov (United States)

    Xu, Gaochao; Ding, Yan; Zhao, Jia; Hu, Liang; Fu, Xiaodong

    2013-01-01

    Green cloud data center has become a research hotspot of virtualized cloud computing architecture. Since live virtual machine (VM) migration technology is widely used and studied in cloud computing, we have focused on the VM placement selection of live migration for power saving. We present a novel heuristic approach which is called PS-ABC. Its algorithm includes two parts. One is that it combines the artificial bee colony (ABC) idea with the uniform random initialization idea, the binary search idea, and Boltzmann selection policy to achieve an improved ABC-based approach with better global exploration's ability and local exploitation's ability. The other one is that it uses the Bayes theorem to further optimize the improved ABC-based process to faster get the final optimal solution. As a result, the whole approach achieves a longer-term efficient optimization for power saving. The experimental results demonstrate that PS-ABC evidently reduces the total incremental power consumption and better protects the performance of VM running and migrating compared with the existing research. It makes the result of live VM migration more high-effective and meaningful.

  20. A Novel Artificial Bee Colony Approach of Live Virtual Machine Migration Policy Using Bayes Theorem

    Directory of Open Access Journals (Sweden)

    Gaochao Xu

    2013-01-01

    Full Text Available Green cloud data center has become a research hotspot of virtualized cloud computing architecture. Since live virtual machine (VM migration technology is widely used and studied in cloud computing, we have focused on the VM placement selection of live migration for power saving. We present a novel heuristic approach which is called PS-ABC. Its algorithm includes two parts. One is that it combines the artificial bee colony (ABC idea with the uniform random initialization idea, the binary search idea, and Boltzmann selection policy to achieve an improved ABC-based approach with better global exploration’s ability and local exploitation’s ability. The other one is that it uses the Bayes theorem to further optimize the improved ABC-based process to faster get the final optimal solution. As a result, the whole approach achieves a longer-term efficient optimization for power saving. The experimental results demonstrate that PS-ABC evidently reduces the total incremental power consumption and better protects the performance of VM running and migrating compared with the existing research. It makes the result of live VM migration more high-effective and meaningful.

  1. Building a protein name dictionary from full text: a machine learning term extraction approach

    Directory of Open Access Journals (Sweden)

    Campagne Fabien

    2005-04-01

    Full Text Available Abstract Background The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature. Results We present an approach to recognize named entities in full text. The approach collects high frequency terms in an article, and uses support vector machines (SVM to identify biological entity names. It is also computationally efficient and robust to noise commonly found in full text material. We use the method to create a protein name dictionary from a set of 80,528 full text articles. Only 8.3% of the names in this dictionary match SwissProt description lines. We assess the quality of the dictionary by studying its protein name recognition performance in full text. Conclusion This dictionary term lookup method compares favourably to other published methods, supporting the significance of our direct extraction approach. The method is strong in recognizing name variants not found in SwissProt.

  2. Forecast of hourly global horizontal irradiance based on structured Kernel Support Vector Machine: A case study of Tibet area in China

    International Nuclear Information System (INIS)

    Jiang, He; Dong, Yao

    2017-01-01

    Highlights: • The structured variable selection in Kernel SVM is implemented using two ways. • The two-way interaction model is considered to enforce Heredity Principle. • SVMIC is used to select the kernel parameter in proposed approaches. • Simple and fast computations algorithms are derived. - Abstract: Various applications of forecasting effective global horizontal irradiance play increasingly vital role in grid-connected photovoltaic installations, but suffer from forecasting inaccuracy and prohibitively expensive computational cost. Although Support Vector Machine (SVM) is one of the most powerful forecasting approaches, it does not provide an interpretable model. This motivates penalized variable selection methods to be introduced to SVM to select important variables. However, in some forecasting problems, there are some underlying logic or hierarchical structure such as heredity principle among the variables. Penalized Kernel SVM approaches do not take heredity principles into consideration when enforcing sparsity. This paper investigates structural variable selection in Kernel SVM based approach which pursues heredity principle and sparsity simultaneously. To achieve heredity principle, both optimization and procedure based structural variable selection approaches are studied in the Kernel SVM. Computationally, we derive fast and simple-to-implement algorithms to perform structural variable selection and solar irradiance forecasting. Furthermore, Support Vector Machines Information Criterion is utilized to select the kernel parameters to guarantee the model consistency. Real data experiments directly reveal that our proposed KSVM-SVS based approach following heredity principle delivers superior performances in terms of forecasting accuracy comparing with other competitors.

  3. Employability and Related Context Prediction Framework for University Graduands: A Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Manushi P. Wijayapala

    2016-12-01

    Full Text Available In Sri Lanka (SL, graduands’ employability remains a national issue due to the increasing number of graduates produced by higher education institutions each year. Thus, predicting the employability of university graduands can mitigate this issue since graduands can identify what qualifications or skills they need to strengthen up in order to find a job of their desired field with a good salary, before they complete the degree. The main objective of the study is to discover the plausibility of applying machine learning approach efficiently and effectively towards predicting the employability and related context of university graduands in Sri Lanka by proposing an architectural framework which consists of four modules; employment status prediction, job salary prediction, job field prediction and job relevance prediction of graduands while also comparing performance of classification algorithms under each prediction module. Series of machine learning algorithms such as C4.5, Naïve Bayes and AODE have been experimented on the Graduand Employment Census - 2014 data. A pre-processing step is proposed to overcome challenges embedded in graduand employability data and a feature selection process is proposed in order to reduce computational complexity. Additionally, parameter tuning is also done to get the most optimized parameters. More importantly, this study utilizes several types of Sampling (Oversampling, Undersampling and Ensemble (Bagging, Boosting, RF techniques as well as a newly proposed hybrid approach to overcome the limitations caused by the class imbalance phenomena. For the validation purposes, a wide range of evaluation measures was used to analyze the effectiveness of applying classification algorithms and class imbalance mitigation techniques on the dataset. The experimented results indicated that RandomForest has recorded the highest classification performance for 3 modules, achieving the selected best predictive models under hybrid

  4. Automated classification of tropical shrub species: a hybrid of leaf shape and machine learning approach.

    Science.gov (United States)

    Murat, Miraemiliana; Chang, Siow-Wee; Abu, Arpah; Yap, Hwa Jen; Yong, Kien-Thai

    2017-01-01

    Plants play a crucial role in foodstuff, medicine, industry, and environmental protection. The skill of recognising plants is very important in some applications, including conservation of endangered species and rehabilitation of lands after mining activities. However, it is a difficult task to identify plant species because it requires specialized knowledge. Developing an automated classification system for plant species is necessary and valuable since it can help specialists as well as the public in identifying plant species easily. Shape descriptors were applied on the myDAUN dataset that contains 45 tropical shrub species collected from the University of Malaya (UM), Malaysia. Based on literature review, this is the first study in the development of tropical shrub species image dataset and classification using a hybrid of leaf shape and machine learning approach. Four types of shape descriptors were used in this study namely morphological shape descriptors (MSD), Histogram of Oriented Gradients (HOG), Hu invariant moments (Hu) and Zernike moments (ZM). Single descriptor, as well as the combination of hybrid descriptors were tested and compared. The tropical shrub species are classified using six different classifiers, which are artificial neural network (ANN), random forest (RF), support vector machine (SVM), k-nearest neighbour (k-NN), linear discriminant analysis (LDA) and directed acyclic graph multiclass least squares twin support vector machine (DAG MLSTSVM). In addition, three types of feature selection methods were tested in the myDAUN dataset, Relief, Correlation-based feature selection (CFS) and Pearson's coefficient correlation (PCC). The well-known Flavia dataset and Swedish Leaf dataset were used as the validation dataset on the proposed methods. The results showed that the hybrid of all descriptors of ANN outperformed the other classifiers with an average classification accuracy of 98.23% for the myDAUN dataset, 95.25% for the Flavia dataset and 99

  5. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

    Science.gov (United States)

    Weng, Wei-Hung; Wagholikar, Kavishwar B; McCray, Alexa T; Szolovits, Peter; Chueh, Henry C

    2017-12-01

    The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. We constructed the pipeline using the clinical NLP system, clinical Text Analysis and Knowledge Extraction System (cTAKES), the Unified Medical Language System (UMLS) Metathesaurus, Semantic Network, and learning algorithms to extract features from two datasets - clinical notes from Integrating Data for Analysis, Anonymization, and Sharing (iDASH) data repository (n = 431) and Massachusetts General Hospital (MGH) (n = 91,237), and built medical subdomain classifiers with different combinations of data representation methods and supervised learning algorithms. We evaluated the performance of classifiers and their portability across the two datasets. The convolutional recurrent neural network with neural word embeddings trained-medical subdomain classifier yielded the best performance measurement on iDASH and MGH datasets with area under receiver operating characteristic curve (AUC) of 0.975 and 0.991, and F1 scores of 0.845 and 0.870, respectively. Considering better clinical interpretability, linear support vector machine-trained medical subdomain classifier using hybrid bag-of-words and clinically relevant UMLS concepts as the feature representation, with term frequency-inverse document frequency (tf-idf)-weighting, outperformed other shallow learning classifiers on iDASH and MGH datasets with AUC of 0.957 and 0.964, and F1 scores of 0.932 and 0.934 respectively. We trained classifiers on one dataset, applied to the other dataset and yielded the threshold of F1 score of 0.7 in classifiers for half of the medical subdomains we studied. Our study shows that a supervised

  6. Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.

    Directory of Open Access Journals (Sweden)

    David Burstein

    2009-07-01

    Full Text Available A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine

  7. Peak detection method evaluation for ion mobility spectrometry by using machine learning approaches.

    Science.gov (United States)

    Hauschild, Anne-Christin; Kopczynski, Dominik; D'Addario, Marianna; Baumbach, Jörg Ingo; Rahmann, Sven; Baumbach, Jan

    2013-04-16

    Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.

  8. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

    Science.gov (United States)

    Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

    2018-01-01

    This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.

  9. Predictive Maintenance of Power Substation Equipment by Infrared Thermography Using a Machine-Learning Approach

    Directory of Open Access Journals (Sweden)

    Irfan Ullah

    2017-12-01

    Full Text Available A variety of reasons, specifically contact issues, irregular loads, cracks in insulation, defective relays, terminal junctions and other similar issues, increase the internal temperature of electrical instruments. This results in unexpected disturbances and potential damage to power equipment. Therefore, the initial prevention measures of thermal anomalies in electrical tools are essential to prevent power-equipment failure. In this article, we address this initial prevention mechanism for power substations using a computer-vision approach by taking advantage of infrared thermal images. The thermal images are taken through infrared cameras without disturbing the working operations of power substations. Thus, this article augments the non-destructive approach to defect analysis in electrical power equipment using computer vision and machine learning. We use a total of 150 thermal pictures of different electrical equipment in 10 different substations in operating conditions, using 300 different hotspots. Our approach uses multi-layered perceptron (MLP to classify the thermal conditions of components of power substations into “defect” and “non-defect” classes. A total of eleven features, which are first-order and second-order statistical features, are calculated from the thermal sample images. The performance of MLP shows initial accuracy of 79.78%. We further augment the MLP with graph cut to increase accuracy to 84%. We argue that with the successful development and deployment of this new system, the Technology Department of Chongqing can arrange the recommended actions and thus save cost in repair and outages. This can play an important role in the quick and reliable inspection to potentially prevent power substation equipment from failure, which will save the whole system from breakdown. The increased 84% accuracy with the integration of the graph cut shows the efficacy of the proposed defect analysis approach.

  10. A novel featureless approach to mass detection in digital mammograms based on support vector machines

    Energy Technology Data Exchange (ETDEWEB)

    Campanini, Renato [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Dongiovanni, Danilo [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Iampieri, Emiro [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Lanconelli, Nico [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Masotti, Matteo [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Palermo, Giuseppe [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Riccardi, Alessandro [Department of Physics, University of Bologna, and INFN, Bologna (Italy); Roffilli, Matteo [Department of Computer Science, University of Bologna, Bologna (Italy)

    2004-03-21

    In this work, we present a novel approach to mass detection in digital mammograms. The great variability of the appearance of masses is the main obstacle to building a mass detection method. It is indeed demanding to characterize all the varieties of masses with a reduced set of features. Hence, in our approach we have chosen not to extract any feature, for the detection of the region of interest; in contrast, we exploit all the information available on the image. A multiresolution overcomplete wavelet representation is performed, in order to codify the image with redundancy of information. The vectors of the very-large space obtained are then provided to a first support vector machine (SVM) classifier. The detection task is considered here as a two-class pattern recognition problem: crops are classified as suspect or not, by using this SVM classifier. False candidates are eliminated with a second cascaded SVM. To further reduce the number of false positives, an ensemble of experts is applied: the final suspect regions are achieved by using a voting strategy. The sensitivity of the presented system is nearly 80% with a false-positive rate of 1.1 marks per image, estimated on images coming from the USF DDSM database.

  11. CoSpa: A Co-training Approach for Spam Review Identification with Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    2016-03-01

    Full Text Available Spam reviews are increasingly appearing on the Internet to promote sales or defame competitors by misleading consumers with deceptive opinions. This paper proposes a co-training approach called CoSpa (Co-training for Spam review identification to identify spam reviews by two views: one is the lexical terms derived from the textual content of the reviews and the other is the PCFG (Probabilistic Context-Free Grammars rules derived from a deep syntax analysis of the reviews. Using SVM (Support Vector Machine as the base classifier, we develop two strategies, CoSpa-C and CoSpa-U, embedded within the CoSpa approach. The CoSpa-C strategy selects unlabeled reviews classified with the largest confidence to augment the training dataset to retrain the classifier. The CoSpa-U strategy randomly selects unlabeled reviews with a uniform distribution of confidence. Experiments on the spam dataset and the deception dataset demonstrate that both the proposed CoSpa algorithms outperform the traditional SVM with lexical terms and PCFG rules in spam review identification. Moreover, the CoSpa-U strategy outperforms the CoSpa-C strategy when we use the absolute value of decision function of SVM as the confidence.

  12. How long will my mouse live? Machine learning approaches for prediction of mouse life span.

    Science.gov (United States)

    Swindell, William R; Harper, James M; Miller, Richard A

    2008-09-01

    Prediction of individual life span based on characteristics evaluated at middle-age represents a challenging objective for aging research. In this study, we used machine learning algorithms to construct models that predict life span in a stock of genetically heterogeneous mice. Life-span prediction accuracy of 22 algorithms was evaluated using a cross-validation approach, in which models were trained and tested with distinct subsets of data. Using a combination of body weight and T-cell subset measures evaluated before 2 years of age, we show that the life-span quartile to which an individual mouse belongs can be predicted with an accuracy of 35.3% (+/-0.10%). This result provides a new benchmark for the development of life-span-predictive models, but improvement can be expected through identification of new predictor variables and development of computational approaches. Future work in this direction can provide tools for aging research and will shed light on associations between phenotypic traits and longevity.

  13. Comparison of machine learned approaches for thyroid nodule characterization from shear wave elastography images

    Science.gov (United States)

    Pereira, Carina; Dighe, Manjiri; Alessio, Adam M.

    2018-02-01

    Various Computer Aided Diagnosis (CAD) systems have been developed that characterize thyroid nodules using the features extracted from the B-mode ultrasound images and Shear Wave Elastography images (SWE). These features, however, are not perfect predictors of malignancy. In other domains, deep learning techniques such as Convolutional Neural Networks (CNNs) have outperformed conventional feature extraction based machine learning approaches. In general, fully trained CNNs require substantial volumes of data, motivating several efforts to use transfer learning with pre-trained CNNs. In this context, we sought to compare the performance of conventional feature extraction, fully trained CNNs, and transfer learning based, pre-trained CNNs for the detection of thyroid malignancy from ultrasound images. We compared these approaches applied to a data set of 964 B-mode and SWE images from 165 patients. The data were divided into 80% training/validation and 20% testing data. The highest accuracies achieved on the testing data for the conventional feature extraction, fully trained CNN, and pre-trained CNN were 0.80, 0.75, and 0.83 respectively. In this application, classification using a pre-trained network yielded the best performance, potentially due to the relatively limited sample size and sub-optimal architecture for the fully trained CNN.

  14. Machine learning approaches to decipher hormone and HER2 receptor status phenotypes in breast cancer.

    Science.gov (United States)

    Adabor, Emmanuel S; Acquaah-Mensah, George K

    2017-10-16

    Breast cancer prognosis and administration of therapies are aided by knowledge of hormonal and HER2 receptor status. Breast cancer lacking estrogen receptors, progesterone receptors and HER2 receptors are difficult to treat. Regarding large data repositories such as The Cancer Genome Atlas, available wet-lab methods for establishing the presence of these receptors do not always conclusively cover all available samples. To this end, we introduce median-supplement methods to identify hormonal and HER2 receptor status phenotypes of breast cancer patients using gene expression profiles. In these approaches, supplementary instances based on median patient gene expression are introduced to balance a training set from which we build simple models to identify the receptor expression status of patients. In addition, for the purpose of benchmarking, we examine major machine learning approaches that are also applicable to the problem of finding receptor status in breast cancer. We show that our methods are robust and have high sensitivity with extremely low false-positive rates compared with the well-established methods. A successful application of these methods will permit the simultaneous study of large collections of samples of breast cancer patients as well as save time and cost while standardizing interpretation of outcomes of such studies. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

    Science.gov (United States)

    Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

    2017-01-01

    Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as

  16. Curriculum Assessment Using Artificial Neural Network and Support Vector Machine Modeling Approaches: A Case Study. IR Applications. Volume 29

    Science.gov (United States)

    Chen, Chau-Kuang

    2010-01-01

    Artificial Neural Network (ANN) and Support Vector Machine (SVM) approaches have been on the cutting edge of science and technology for pattern recognition and data classification. In the ANN model, classification accuracy can be achieved by using the feed-forward of inputs, back-propagation of errors, and the adjustment of connection weights. In…

  17. Machine learning approaches to analyze histological images of tissues from radical prostatectomies.

    Science.gov (United States)

    Gertych, Arkadiusz; Ing, Nathan; Ma, Zhaoxuan; Fuchs, Thomas J; Salman, Sadri; Mohanty, Sambit; Bhele, Sanica; Velásquez-Vacca, Adriana; Amin, Mahul B; Knudsen, Beatrice S

    2015-12-01

    Computerized evaluation of histological preparations of prostate tissues involves identification of tissue components such as stroma (ST), benign/normal epithelium (BN) and prostate cancer (PCa). Image classification approaches have been developed to identify and classify glandular regions in digital images of prostate tissues; however their success has been limited by difficulties in cellular segmentation and tissue heterogeneity. We hypothesized that utilizing image pixels to generate intensity histograms of hematoxylin (H) and eosin (E) stains deconvoluted from H&E images numerically captures the architectural difference between glands and stroma. In addition, we postulated that joint histograms of local binary patterns and local variance (LBPxVAR) can be used as sensitive textural features to differentiate benign/normal tissue from cancer. Here we utilized a machine learning approach comprising of a support vector machine (SVM) followed by a random forest (RF) classifier to digitally stratify prostate tissue into ST, BN and PCa areas. Two pathologists manually annotated 210 images of low- and high-grade tumors from slides that were selected from 20 radical prostatectomies and digitized at high-resolution. The 210 images were split into the training (n=19) and test (n=191) sets. Local intensity histograms of H and E were used to train a SVM classifier to separate ST from epithelium (BN+PCa). The performance of SVM prediction was evaluated by measuring the accuracy of delineating epithelial areas. The Jaccard J=59.5 ± 14.6 and Rand Ri=62.0 ± 7.5 indices reported a significantly better prediction when compared to a reference method (Chen et al., Clinical Proteomics 2013, 10:18) based on the averaged values from the test set. To distinguish BN from PCa we trained a RF classifier with LBPxVAR and local intensity histograms and obtained separate performance values for BN and PCa: JBN=35.2 ± 24.9, OBN=49.6 ± 32, JPCa=49.5 ± 18.5, OPCa=72.7 ± 14.8 and Ri=60.6

  18. Machine learning approaches to supporting the identification of photoreceptor-enriched genes based on expression data

    Directory of Open Access Journals (Sweden)

    Simpson David

    2006-03-01

    Full Text Available Abstract Background Retinal photoreceptors are highly specialised cells, which detect light and are central to mammalian vision. Many retinal diseases occur as a result of inherited dysfunction of the rod and cone photoreceptor cells. Development and maintenance of photoreceptors requires appropriate regulation of the many genes specifically or highly expressed in these cells. Over the last decades, different experimental approaches have been developed to identify photoreceptor enriched genes. Recent progress in RNA analysis technology has generated large amounts of gene expression data relevant to retinal development. This paper assesses a machine learning methodology for supporting the identification of photoreceptor enriched genes based on expression data. Results Based on the analysis of publicly-available gene expression data from the developing mouse retina generated by serial analysis of gene expression (SAGE, this paper presents a predictive methodology comprising several in silico models for detecting key complex features and relationships encoded in the data, which may be useful to distinguish genes in terms of their functional roles. In order to understand temporal patterns of photoreceptor gene expression during retinal development, a two-way cluster analysis was firstly performed. By clustering SAGE libraries, a hierarchical tree reflecting relationships between developmental stages was obtained. By clustering SAGE tags, a more comprehensive expression profile for photoreceptor cells was revealed. To demonstrate the usefulness of machine learning-based models in predicting functional associations from the SAGE data, three supervised classification models were compared. The results indicated that a relatively simple instance-based model (KStar model performed significantly better than relatively more complex algorithms, e.g. neural networks. To deal with the problem of functional class imbalance occurring in the dataset, two data re

  19. Rainfall Prediction of Indian Peninsula: Comparison of Time Series Based Approach and Predictor Based Approach using Machine Learning Techniques

    Science.gov (United States)

    Dash, Y.; Mishra, S. K.; Panigrahi, B. K.

    2017-12-01

    Prediction of northeast/post monsoon rainfall which occur during October, November and December (OND) over Indian peninsula is a challenging task due to the dynamic nature of uncertain chaotic climate. It is imperative to elucidate this issue by examining performance of different machine leaning (ML) approaches. The prime objective of this research is to compare between a) statistical prediction using historical rainfall observations and global atmosphere-ocean predictors like Sea Surface Temperature (SST) and Sea Level Pressure (SLP) and b) empirical prediction based on a time series analysis of past rainfall data without using any other predictors. Initially, ML techniques have been applied on SST and SLP data (1948-2014) obtained from NCEP/NCAR reanalysis monthly mean provided by the NOAA ESRL PSD. Later, this study investigated the applicability of ML methods using OND rainfall time series for 1948-2014 and forecasted up to 2018. The predicted values of aforementioned methods were verified using observed time series data collected from Indian Institute of Tropical Meteorology and the result revealed good performance of ML algorithms with minimal error scores. Thus, it is found that both statistical and empirical methods are useful for long range climatic projections.

  20. Machine Learning Approach for Prediction and Understanding of Glass-Forming Ability.

    Science.gov (United States)

    Sun, Y T; Bai, H Y; Li, M Z; Wang, W H

    2017-07-20

    The prediction of the glass-forming ability (GFA) by varying the composition of alloys is a challenging problem in glass physics, as well as a problem for industry, with enormous financial ramifications. Although different empirical guides for the prediction of GFA were established over decades, a comprehensive model or approach that is able to deal with as many variables as possible simultaneously for efficiently predicting good glass formers is still highly desirable. Here, by applying the support vector classification method, we develop models for predicting the GFA of binary metallic alloys from random compositions. The effect of different input descriptors on GFA were evaluated, and the best prediction model was selected, which shows that the information related to liquidus temperatures plays a key role in the GFA of alloys. On the basis of this model, good glass formers can be predicted with high efficiency. The prediction efficiency can be further enhanced by improving larger database and refined input descriptor selection. Our findings suggest that machine learning is very powerful and efficient and has great potential for discovering new metallic glasses with good GFA.

  1. Promises of Machine Learning Approaches in Prediction of Absorption of Compounds.

    Science.gov (United States)

    Kumar, Rajnish; Sharma, Anju; Siddiqui, Mohammed Haris; Tiwari, Rajesh Kumar

    2018-01-01

    The Machine Learning (ML) is one of the fastest developing techniques in the prediction and evaluation of important pharmacokinetic properties such as absorption, distribution, metabolism and excretion. The availability of a large number of robust validation techniques for prediction models devoted to pharmacokinetics has significantly enhanced the trust and authenticity in ML approaches. There is a series of prediction models generated and used for rapid screening of compounds on the basis of absorption in last one decade. Prediction of absorption of compounds using ML models has great potential across the pharmaceutical industry as a non-animal alternative to predict absorption. However, these prediction models still have to go far ahead to develop the confidence similar to conventional experimental methods for estimation of drug absorption. Some of the general concerns are selection of appropriate ML methods and validation techniques in addition to selecting relevant descriptors and authentic data sets for the generation of prediction models. The current review explores published models of ML for the prediction of absorption using physicochemical properties as descriptors and their important conclusions. In addition, some critical challenges in acceptance of ML models for absorption are also discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. A machine learning approach to automated structural network analysis: application to neonatal encephalopathy.

    Directory of Open Access Journals (Sweden)

    Etay Ziv

    Full Text Available Neonatal encephalopathy represents a heterogeneous group of conditions associated with life-long developmental disabilities and neurological deficits. Clinical measures and current anatomic brain imaging remain inadequate predictors of outcome in children with neonatal encephalopathy. Some studies have suggested that brain development and, therefore, brain connectivity may be altered in the subgroup of patients who subsequently go on to develop clinically significant neurological abnormalities. Large-scale structural brain connectivity networks constructed using diffusion tractography have been posited to reflect organizational differences in white matter architecture at the mesoscale, and thus offer a unique tool for characterizing brain development in patients with neonatal encephalopathy. In this manuscript we use diffusion tractography to construct structural networks for a cohort of patients with neonatal encephalopathy. We systematically map these networks to a high-dimensional space and then apply standard machine learning algorithms to predict neurological outcome in the cohort. Using nested cross-validation we demonstrate high prediction accuracy that is both statistically significant and robust over a broad range of thresholds. Our algorithm offers a novel tool to evaluate neonates at risk for developing neurological deficit. The described approach can be applied to any brain pathology that affects structural connectivity.

  3. Machine Learning Approach for Software Reliability Growth Modeling with Infinite Testing Effort Function

    Directory of Open Access Journals (Sweden)

    Subburaj Ramasamy

    2017-01-01

    Full Text Available Reliability is one of the quantifiable software quality attributes. Software Reliability Growth Models (SRGMs are used to assess the reliability achieved at different times of testing. Traditional time-based SRGMs may not be accurate enough in all situations where test effort varies with time. To overcome this lacuna, test effort was used instead of time in SRGMs. In the past, finite test effort functions were proposed, which may not be realistic as, at infinite testing time, test effort will be infinite. Hence in this paper, we propose an infinite test effort function in conjunction with a classical Nonhomogeneous Poisson Process (NHPP model. We use Artificial Neural Network (ANN for training the proposed model with software failure data. Here it is possible to get a large set of weights for the same model to describe the past failure data equally well. We use machine learning approach to select the appropriate set of weights for the model which will describe both the past and the future data well. We compare the performance of the proposed model with existing model using practical software failure data sets. The proposed log-power TEF based SRGM describes all types of failure data equally well and also improves the accuracy of parameter estimation more than existing TEF and can be used for software release time determination as well.

  4. A Machine Learning Approach for Using the Postmortem Skin Microbiome to Estimate the Postmortem Interval.

    Directory of Open Access Journals (Sweden)

    Hunter R Johnson

    Full Text Available Research on the human microbiome, the microbiota that live in, on, and around the human person, has revolutionized our understanding of the complex interactions between microbial life and human health and disease. The microbiome may also provide a valuable tool in forensic death investigations by helping to reveal the postmortem interval (PMI of a decedent that is discovered after an unknown amount of time since death. Current methods of estimating PMI for cadavers discovered in uncontrolled, unstudied environments have substantial limitations, some of which may be overcome through the use of microbial indicators. In this project, we sampled the microbiomes of decomposing human cadavers, focusing on the skin microbiota found in the nasal and ear canals. We then developed several models of statistical regression to establish an algorithm for predicting the PMI of microbial samples. We found that the complete data set, rather than a curated list of indicator species, was preferred for training the regressor. We further found that genus and family, rather than species, are the most informative taxonomic levels. Finally, we developed a k-nearest- neighbor regressor, tuned with the entire data set from all nasal and ear samples, that predicts the PMI of unknown samples with an average error of ±55 accumulated degree days (ADD. This study outlines a machine learning approach for the use of necrobiome data in the prediction of the PMI and thereby provides a successful proof-of- concept that skin microbiota is a promising tool in forensic death investigations.

  5. A data-driven predictive approach for drug delivery using machine learning techniques.

    Directory of Open Access Journals (Sweden)

    Yuanyuan Li

    Full Text Available In drug delivery, there is often a trade-off between effective killing of the pathogen, and harmful side effects associated with the treatment. Due to the difficulty in testing every dosing scenario experimentally, a computational approach will be helpful to assist with the prediction of effective drug delivery methods. In this paper, we have developed a data-driven predictive system, using machine learning techniques, to determine, in silico, the effectiveness of drug dosing. The system framework is scalable, autonomous, robust, and has the ability to predict the effectiveness of the current drug treatment and the subsequent drug-pathogen dynamics. The system consists of a dynamic model incorporating both the drug concentration and pathogen population into distinct states. These states are then analyzed using a temporal model to describe the drug-cell interactions over time. The dynamic drug-cell interactions are learned in an adaptive fashion and used to make sequential predictions on the effectiveness of the dosing strategy. Incorporated into the system is the ability to adjust the sensitivity and specificity of the learned models based on a threshold level determined by the operator for the specific application. As a proof-of-concept, the system was validated experimentally using the pathogen Giardia lamblia and the drug metronidazole in vitro.

  6. Remote sensing-based measurement of Living Environment Deprivation: Improving classical approaches with machine learning.

    Directory of Open Access Journals (Sweden)

    Daniel Arribas-Bel

    Full Text Available This paper provides evidence on the usefulness of very high spatial resolution (VHR imagery in gathering socioeconomic information in urban settlements. We use land cover, spectral, structure and texture features extracted from a Google Earth image of Liverpool (UK to evaluate their potential to predict Living Environment Deprivation at a small statistical area level. We also contribute to the methodological literature on the estimation of socioeconomic indices with remote-sensing data by introducing elements from modern machine learning. In addition to classical approaches such as Ordinary Least Squares (OLS regression and a spatial lag model, we explore the potential of the Gradient Boost Regressor and Random Forests to improve predictive performance and accuracy. In addition to novel predicting methods, we also introduce tools for model interpretation and evaluation such as feature importance and partial dependence plots, or cross-validation. Our results show that Random Forest proved to be the best model with an R2 of around 0.54, followed by Gradient Boost Regressor with 0.5. Both the spatial lag model and the OLS fall behind with significantly lower performances of 0.43 and 0.3, respectively.

  7. A support vector machine approach to the automatic identification of fluorescence spectra emitted by biological agents

    Science.gov (United States)

    Gelfusa, M.; Murari, A.; Lungaroni, M.; Malizia, A.; Parracino, S.; Peluso, E.; Cenciarelli, O.; Carestia, M.; Pizzoferrato, R.; Vega, J.; Gaudio, P.

    2016-10-01

    Two of the major new concerns of modern societies are biosecurity and biosafety. Several biological agents (BAs) such as toxins, bacteria, viruses, fungi and parasites are able to cause damage to living systems either humans, animals or plants. Optical techniques, in particular LIght Detection And Ranging (LIDAR), based on the transmission of laser pulses and analysis of the return signals, can be successfully applied to monitoring the release of biological agents into the atmosphere. It is well known that most of biological agents tend to emit specific fluorescence spectra, which in principle allow their detection and identification, if excited by light of the appropriate wavelength. For these reasons, the detection of the UVLight Induced Fluorescence (UV-LIF) emitted by BAs is particularly promising. On the other hand, the stand-off detection of BAs poses a series of challenging issues; one of the most severe is the automatic discrimination between various agents which emit very similar fluorescence spectra. In this paper, a new data analysis method, based on a combination of advanced filtering techniques and Support Vector Machines, is described. The proposed approach covers all the aspects of the data analysis process, from filtering and denoising to automatic recognition of the agents. A systematic series of numerical tests has been performed to assess the potential and limits of the proposed methodology. The first investigations of experimental data have already given very encouraging results.

  8. A microscopy approach for in situ inspection of micro-coordinate measurement machine styli for contamination

    Science.gov (United States)

    Feng, Xiaobing; Pascal, Jonathan; Lawes, Simon

    2017-09-01

    During the process of measurement using a micro-coordinate measurement machine (µCMM) contamination gradually builds up on the surface of the stylus tip and affects the dimensional accuracy of the measurement. Regular inspection of the stylus for contamination is essential to determine the appropriate cleaning interval and prevent the dimensional error from becoming significant. However, in situ inspection of a µCMM stylus is challenging due to the size, spherical shape, material and surface properties of a typical stylus. To address this challenge, this study evaluates several non-contact measurement technologies for in situ stylus inspection and, based on those findings, proposes a cost-effective microscopy approach. The operational principle is then demonstrated by an automated prototype, coordinated directly by the CMM software MCOSMOS, with an effective threshold of detection as low as 400 nm and a large field of view and depth of field. The level of contamination on the stylus has been found to increase steadily with the number of measurement contacts made. Once excessive contamination is detected on the stylus, measurement should be stopped and a stylus cleaning procedure should be performed to avoid affecting measurement accuracy.

  9. A microscopy approach for in situ inspection of micro-coordinate measurement machine styli for contamination

    International Nuclear Information System (INIS)

    Feng, Xiaobing; Lawes, Simon; Pascal, Jonathan

    2017-01-01

    During the process of measurement using a micro-coordinate measurement machine (µCMM) contamination gradually builds up on the surface of the stylus tip and affects the dimensional accuracy of the measurement. Regular inspection of the stylus for contamination is essential to determine the appropriate cleaning interval and prevent the dimensional error from becoming significant. However, in situ inspection of a µCMM stylus is challenging due to the size, spherical shape, material and surface properties of a typical stylus. To address this challenge, this study evaluates several non-contact measurement technologies for in situ stylus inspection and, based on those findings, proposes a cost-effective microscopy approach. The operational principle is then demonstrated by an automated prototype, coordinated directly by the CMM software MCOSMOS, with an effective threshold of detection as low as 400 nm and a large field of view and depth of field. The level of contamination on the stylus has been found to increase steadily with the number of measurement contacts made. Once excessive contamination is detected on the stylus, measurement should be stopped and a stylus cleaning procedure should be performed to avoid affecting measurement accuracy. (paper)

  10. Improved Membership Probability for Moving Groups: Bayesian and Machine Learning Approaches

    Science.gov (United States)

    Lee, Jinhee; Song, Inseok

    2018-01-01

    Gravitationally unbound loose stellar associations (i.e., young nearby moving groups: moving groups hereafter) have been intensively explored because they are important in planet and disk formation studies, exoplanet imaging, and age calibration. Among the many efforts devoted to the search for moving group members, a Bayesian approach (e.g.,using the code BANYAN) has become popular recently because of the many advantages it offers. However, the resultant membership probability needs to be carefully adopted because of its sensitive dependence on input models. In this study, we have developed an improved membership calculation tool focusing on the beta-Pic moving group. We made three improvements for building models used in BANYAN II: (1) updating a list of accepted members by re-assessing memberships in terms of position, motion, and age, (2) investigating member distribution functions in XYZ, and (3) exploring field star distribution functions in XYZUVW. Our improved tool can change membership probability up to 70%. Membership probability is critical and must be better defined. For example, our code identifies only one third of the candidate members in SIMBAD that are believed to be kinematically associated with beta-Pic moving group.Additionally, we performed cluster analysis of young nearby stars using an unsupervised machine learning approach. As more moving groups and their members are identified, the complexity and ambiguity in moving group configuration has been increased. To clarify this issue, we analyzed ~4,000 X-ray bright young stellar candidates. Here, we present the preliminary results. By re-identifying moving groups with the least human intervention, we expect to understand the composition of the solar neighborhood. Moreover better defined moving group membership will help us understand star formation and evolution in relatively low density environments; especially for the low-mass stars which will be identified in the coming Gaia release.

  11. Other programmatic agencies in the metropolis: a machinic approach to urban reterritorialization processes

    Directory of Open Access Journals (Sweden)

    Igor Guatelli

    2013-06-01

    Full Text Available What if the strength of the architectural object were associated with program and spatial strategies engendered at the service of “habitability” and future sociabilities rather than with the building of monumental architectural gadgets and optical events in the landscape? Based on the Deleuzean (from the philosopher Gilles Deleuze machinic phylum as well as concepts associated with it such as “bonding” and “agency,” using the Lacanian approach (from the psychiatrist Jacques Lacan to the gadget concept and the Derridian concept (from the philosopher Jacques Derrida of “supplement,” this article discusses a shift of the most current senses and representations of contemporary urban architectural design historically associated with the notable (meaning the wish to be noticed formal and composite materialization of the artistic object at the service of programmed sociabilities towards nother conceptualization. The building of architectural supports from residual (according to Deleuze, the possibility of producing other wishes, far from the dominant capitalist logic, lies in residues in the residual flows produced by the capital itself programmatic and spatial agencies emerges as a critical path to the categorical imperative of the generalizing global logic. It is a logic based on non-territorial landscapes and centered on investments in the composite view and intentional spatial and programmatic imprisonments in familiar formulae originating from domesticated and standardized prêt-à-utiliser thinking. To think about other architectural spatial and programmatic agencies originating from residues and flows that simultaneously rise from and escape the global logic is to bet on the chance of non-programmed sociabilities taking place. Ceasing to think about architecture as a formal object in its artistic and paradigmatic dimension would mean to conceive it as an urban syntagmatic machine of [de]constructive power

  12. TargetSpy: a supervised machine learning approach for microRNA target prediction.

    Science.gov (United States)

    Sturm, Martin; Hackenberg, Michael; Langenberger, David; Frishman, Dmitrij

    2010-05-28

    Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila

  13. TargetSpy: a supervised machine learning approach for microRNA target prediction

    Directory of Open Access Journals (Sweden)

    Langenberger David

    2010-05-01

    Full Text Available Abstract Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I no seed match requirement, II seed match requirement, and III conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on

  14. Computational prediction of multidisciplinary team decision-making for adjuvant breast cancer drug therapies: a machine learning approach.

    Science.gov (United States)

    Lin, Frank P Y; Pokorny, Adrian; Teng, Christina; Dear, Rachel; Epstein, Richard J

    2016-12-01

    Multidisciplinary team (MDT) meetings are used to optimise expert decision-making about treatment options, but such expertise is not digitally transferable between centres. To help standardise medical decision-making, we developed a machine learning model designed to predict MDT decisions about adjuvant breast cancer treatments. We analysed MDT decisions regarding adjuvant systemic therapy for 1065 breast cancer cases over eight years. Machine learning classifiers with and without bootstrap aggregation were correlated with MDT decisions (recommended, not recommended, or discussable) regarding adjuvant cytotoxic, endocrine and biologic/targeted therapies, then tested for predictability using stratified ten-fold cross-validations. The predictions so derived were duly compared with those based on published (ESMO and NCCN) cancer guidelines. Machine learning more accurately predicted adjuvant chemotherapy MDT decisions than did simple application of guidelines. No differences were found between MDT- vs. ESMO/NCCN- based decisions to prescribe either adjuvant endocrine (97%, p = 0.44/0.74) or biologic/targeted therapies (98%, p = 0.82/0.59). In contrast, significant discrepancies were evident between MDT- and guideline-based decisions to prescribe chemotherapy (87%, p machine learning models. A machine learning approach based on clinicopathologic characteristics can predict MDT decisions about adjuvant breast cancer drug therapies. The discrepancy between MDT- and guideline-based decisions regarding adjuvant chemotherapy implies that certain non-clincopathologic criteria, such as patient preference and resource availability, are factored into clinical decision-making by local experts but not captured by guidelines.

  15. Smart Cutting Tools and Smart Machining: Development Approaches, and Their Implementation and Application Perspectives

    Science.gov (United States)

    Cheng, Kai; Niu, Zhi-Chao; Wang, Robin C.; Rakowski, Richard; Bateman, Richard

    2017-09-01

    Smart machining has tremendous potential and is becoming one of new generation high value precision manufacturing technologies in line with the advance of Industry 4.0 concepts. This paper presents some innovative design concepts and, in particular, the development of four types of smart cutting tools, including a force-based smart cutting tool, a temperature-based internally-cooled cutting tool, a fast tool servo (FTS) and smart collets for ultraprecision and micro manufacturing purposes. Implementation and application perspectives of these smart cutting tools are explored and discussed particularly for smart machining against a number of industrial application requirements. They are contamination-free machining, machining of tool-wear-prone Si-based infra-red devices and medical applications, high speed micro milling and micro drilling, etc. Furthermore, implementation techniques are presented focusing on: (a) plug-and-produce design principle and the associated smart control algorithms, (b) piezoelectric film and surface acoustic wave transducers to measure cutting forces in process, (c) critical cutting temperature control in real-time machining, (d) in-process calibration through machining trials, (e) FE-based design and analysis of smart cutting tools, and (f) application exemplars on adaptive smart machining.

  16. Automated classification of tropical shrub species: a hybrid of leaf shape and machine learning approach

    Directory of Open Access Journals (Sweden)

    Miraemiliana Murat

    2017-09-01

    Full Text Available Plants play a crucial role in foodstuff, medicine, industry, and environmental protection. The skill of recognising plants is very important in some applications, including conservation of endangered species and rehabilitation of lands after mining activities. However, it is a difficult task to identify plant species because it requires specialized knowledge. Developing an automated classification system for plant species is necessary and valuable since it can help specialists as well as the public in identifying plant species easily. Shape descriptors were applied on the myDAUN dataset that contains 45 tropical shrub species collected from the University of Malaya (UM, Malaysia. Based on literature review, this is the first study in the development of tropical shrub species image dataset and classification using a hybrid of leaf shape and machine learning approach. Four types of shape descriptors were used in this study namely morphological shape descriptors (MSD, Histogram of Oriented Gradients (HOG, Hu invariant moments (Hu and Zernike moments (ZM. Single descriptor, as well as the combination of hybrid descriptors were tested and compared. The tropical shrub species are classified using six different classifiers, which are artificial neural network (ANN, random forest (RF, support vector machine (SVM, k-nearest neighbour (k-NN, linear discriminant analysis (LDA and directed acyclic graph multiclass least squares twin support vector machine (DAG MLSTSVM. In addition, three types of feature selection methods were tested in the myDAUN dataset, Relief, Correlation-based feature selection (CFS and Pearson’s coefficient correlation (PCC. The well-known Flavia dataset and Swedish Leaf dataset were used as the validation dataset on the proposed methods. The results showed that the hybrid of all descriptors of ANN outperformed the other classifiers with an average classification accuracy of 98.23% for the myDAUN dataset, 95.25% for the Flavia

  17. Bayesian analysis of rotating machines - A statistical approach to estimate and track the fundamental frequency

    DEFF Research Database (Denmark)

    Pedersen, Thorkild Find

    2003-01-01

    frequency and the related frequencies as orders of the fundamental frequency. When analyzing rotating or reciprocating machines it is important to know the running speed. Usually this requires direct access to the rotating parts in order to mount a dedicated tachometer probe. In this thesis different......Rotating and reciprocating mechanical machines emit acoustic noise and vibrations when they operate. Typically, the noise and vibrations are concentrated in narrow frequency bands related to the running speed of the machine. The frequency of the running speed is referred to as the fundamental...

  18. Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Arumugam, Kamesh [Old Dominion Univ., Norfolk, VA (United States)

    2017-05-01

    the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.

  19. Poster abstract: A machine learning approach for vehicle classification using passive infrared and ultrasonic sensors

    KAUST Repository

    Warriach, Ehsan Ullah; Claudel, Christian G.

    2013-01-01

    This article describes the implementation of four different machine learning techniques for vehicle classification in a dual ultrasonic/passive infrared traffic flow sensors. Using k-NN, Naive Bayes, SVM and KNN-SVM algorithms, we show that KNN

  20. A Two-Layer Least Squares Support Vector Machine Approach to Credit Risk Assessment

    Science.gov (United States)

    Liu, Jingli; Li, Jianping; Xu, Weixuan; Shi, Yong

    Least squares support vector machine (LS-SVM) is a revised version of support vector machine (SVM) and has been proved to be a useful tool for pattern recognition. LS-SVM had excellent generalization performance and low computational cost. In this paper, we propose a new method called two-layer least squares support vector machine which combines kernel principle component analysis (KPCA) and linear programming form of least square support vector machine. With this method sparseness and robustness is obtained while solving large dimensional and large scale database. A U.S. commercial credit card database is used to test the efficiency of our method and the result proved to be a satisfactory one.

  1. The art and science of rotating field machines design a practical approach

    CERN Document Server

    Ostović, Vlado

    2017-01-01

    This book highlights procedures utilized by the design departments of leading global manufacturers, offering readers essential insights into the electromagnetic and thermal design of rotating field (induction and synchronous) electric machines. Further, it details the physics of the key phenomena involved in the machines’ operation, conducts a thorough analysis and synthesis of polyphase windings, and presents the tools and methods used in the evaluation of winding performance. The book develops and solves the machines’ magnetic circuits, and determines their electromagnetic forces and torques. Special attention is paid to thermal problems in electrical machines, along with fluid flow computations. With a clear emphasis on the practical aspects of electric machine design and synthesis, the author applies his nearly 40 years of professional experience with electric machine manufacturers – both as an employee and consultant – to provide readers with the tools they need to determine fluid flow parameters...

  2. Defining difficult laryngoscopy findings by using multiple parameters: A machine learning approach

    Directory of Open Access Journals (Sweden)

    Moustafa Abdelaziz Moustafa

    2017-04-01

    Conclusion: “Alex Difficult Laryngoscopy Software” (ADLS is a machine learning program for prediction of difficult laryngoscopy. New cases can be entered to the training set thus improving the accuracy of the software.

  3. Methods and Research for Multi-Component Cutting Force Sensing Devices and Approaches in Machining

    Directory of Open Access Journals (Sweden)

    Qiaokang Liang

    2016-11-01

    Full Text Available Multi-component cutting force sensing systems in manufacturing processes applied to cutting tools are gradually becoming the most significant monitoring indicator. Their signals have been extensively applied to evaluate the machinability of workpiece materials, predict cutter breakage, estimate cutting tool wear, control machine tool chatter, determine stable machining parameters, and improve surface finish. Robust and effective sensing systems with capability of monitoring the cutting force in machine operations in real time are crucial for realizing the full potential of cutting capabilities of computer numerically controlled (CNC tools. The main objective of this paper is to present a brief review of the existing achievements in the field of multi-component cutting force sensing systems in modern manufacturing.

  4. Methods and Research for Multi-Component Cutting Force Sensing Devices and Approaches in Machining.

    Science.gov (United States)

    Liang, Qiaokang; Zhang, Dan; Wu, Wanneng; Zou, Kunlin

    2016-11-16

    Multi-component cutting force sensing systems in manufacturing processes applied to cutting tools are gradually becoming the most significant monitoring indicator. Their signals have been extensively applied to evaluate the machinability of workpiece materials, predict cutter breakage, estimate cutting tool wear, control machine tool chatter, determine stable machining parameters, and improve surface finish. Robust and effective sensing systems with capability of monitoring the cutting force in machine operations in real time are crucial for realizing the full potential of cutting capabilities of computer numerically controlled (CNC) tools. The main objective of this paper is to present a brief review of the existing achievements in the field of multi-component cutting force sensing systems in modern manufacturing.

  5. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach

    OpenAIRE

    Weng, Wei-Hung; Wagholikar, Kavishwar B.; McCray, Alexa T.; Szolovits, Peter; Chueh, Henry C.

    2017-01-01

    Background The medical subdomain of a clinical note, such as cardiology or neurology, is useful content-derived metadata for developing machine learning downstream applications. To classify the medical subdomain of a note accurately, we have constructed a machine learning-based natural language processing (NLP) pipeline and developed medical subdomain classifiers based on the content of the note. Methods We constructed the pipeline using the clinical ...

  6. Machine Learning

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Machine learning, which builds on ideas in computer science, statistics, and optimization, focuses on developing algorithms to identify patterns and regularities in data, and using these learned patterns to make predictions on new observations. Boosted by its industrial and commercial applications, the field of machine learning is quickly evolving and expanding. Recent advances have seen great success in the realms of computer vision, natural language processing, and broadly in data science. Many of these techniques have already been applied in particle physics, for instance for particle identification, detector monitoring, and the optimization of computer resources. Modern machine learning approaches, such as deep learning, are only just beginning to be applied to the analysis of High Energy Physics data to approach more and more complex problems. These classes will review the framework behind machine learning and discuss recent developments in the field.

  7. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches.

    Science.gov (United States)

    Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W

    2015-08-01

    Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.

  8. A geometric approach for fault detection and isolation of stator short circuit failure in a single asynchronous machine

    KAUST Repository

    Khelouat, Samir

    2012-06-01

    This paper deals with the problem of detection and isolation of stator short-circuit failure in a single asynchronous machine using a geometric approach. After recalling the basis of the geometric approach for fault detection and isolation in nonlinear systems, we will study some structural properties which are fault detectability and isolation fault filter existence. We will then design filters for residual generation. We will consider two approaches: a two-filters structure and a single filter structure, both aiming at generating residuals which are sensitive to one fault and insensitive to the other faults. Some numerical tests will be presented to illustrate the efficiency of the method.

  9. A comparison of rule-based and machine learning approaches for classifying patient portal messages.

    Science.gov (United States)

    Cronin, Robert M; Fabbri, Daniel; Denny, Joshua C; Rosenbloom, S Trent; Jackson, Gretchen Purcell

    2017-09-01

    Secure messaging through patient portals is an increasingly popular way that consumers interact with healthcare providers. The increasing burden of secure messaging can affect clinic staffing and workflows. Manual management of portal messages is costly and time consuming. Automated classification of portal messages could potentially expedite message triage and delivery of care. We developed automated patient portal message classifiers with rule-based and machine learning techniques using bag of words and natural language processing (NLP) approaches. To evaluate classifier performance, we used a gold standard of 3253 portal messages manually categorized using a taxonomy of communication types (i.e., main categories of informational, medical, logistical, social, and other communications, and subcategories including prescriptions, appointments, problems, tests, follow-up, contact information, and acknowledgement). We evaluated our classifiers' accuracies in identifying individual communication types within portal messages with area under the receiver-operator curve (AUC). Portal messages often contain more than one type of communication. To predict all communication types within single messages, we used the Jaccard Index. We extracted the variables of importance for the random forest classifiers. The best performing approaches to classification for the major communication types were: logistic regression for medical communications (AUC: 0.899); basic (rule-based) for informational communications (AUC: 0.842); and random forests for social communications and logistical communications (AUCs: 0.875 and 0.925, respectively). The best performing classification approach of classifiers for individual communication subtypes was random forests for Logistical-Contact Information (AUC: 0.963). The Jaccard Indices by approach were: basic classifier, Jaccard Index: 0.674; Naïve Bayes, Jaccard Index: 0.799; random forests, Jaccard Index: 0.859; and logistic regression, Jaccard

  10. An empirical comparison of different approaches for combining multimodal neuroimaging data with Support Vector Machine

    Directory of Open Access Journals (Sweden)

    William ePettersson-Yeo

    2014-07-01

    Full Text Available In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine (SVM, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realised. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: 1 an un-weighted sum of kernels, 2 multi-kernel learning, 3 prediction averaging, and 4 majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (UHR; n=19, first episode psychosis (FEP; n=19 and healthy control subjects (HCs; n=19. Our results show that i whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, ii where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, iii the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no magic bullet for increasing classification accuracy.

  11. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

    Science.gov (United States)

    Giraldo, Sergio I.; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules

  12. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

    Directory of Open Access Journals (Sweden)

    Sergio Ivan Giraldo

    2016-12-01

    Full Text Available Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1 quantitatively evaluate the accuracy of the induced models, (2 analyse the relative importance of the considered musical features, (3 discuss some of the learnt expressive performance rules in the context of previous work, and (4 assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules’ performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the

  13. A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music.

    Science.gov (United States)

    Giraldo, Sergio I; Ramirez, Rafael

    2016-01-01

    Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules' performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.

  14. Designing Green Stormwater Infrastructure for Hydrologic and Human Benefits: An Image Based Machine Learning Approach

    Science.gov (United States)

    Rai, A.; Minsker, B. S.

    2014-12-01

    Urbanization over the last century has degraded our natural water resources by increasing storm-water runoff, reducing nutrient retention, and creating poor ecosystem health downstream. The loss of tree canopy and expansion of impervious area and storm sewer systems have significantly decreased infiltration and evapotranspiration, increased stream-flow velocities, and increased flood risk. These problems have brought increasing attention to catchment-wide implementation of green infrastructure (e.g., decentralized green storm water management practices such as bioswales, rain gardens, permeable pavements, tree box filters, cisterns, urban wetlands, urban forests, stream buffers, and green roofs) to replace or supplement conventional storm water management practices and create more sustainable urban water systems. Current green infrastructure (GI) practice aims at mitigating the negative effects of urbanization by restoring pre-development hydrology and ultimately addressing water quality issues at an urban catchment scale. The benefits of green infrastructure extend well beyond local storm water management, as urban green spaces are also major contributors to human health. Considerable research in the psychological sciences have shown significant human health benefits from appropriately designed green spaces, yet impacts on human wellbeing have not yet been formally considered in GI design frameworks. This research is developing a novel computational green infrastructure (GI) design framework that integrates hydrologic requirements with criteria for human wellbeing. A supervised machine learning model is created to identify specific patterns in urban green spaces that promote human wellbeing; the model is linked to RHESSYS model to evaluate GI designs in terms of both hydrologic and human health benefits. An application of the models to Dead Run Watershed in Baltimore showed that image mining methods were able to capture key elements of human preferences that could

  15. A Machine Learning Approach to Identifying Placebo Responders in Late-Life Depression Trials.

    Science.gov (United States)

    Zilcha-Mano, Sigal; Roose, Steven P; Brown, Patrick J; Rutherford, Bret R

    2018-01-11

    Despite efforts to identify characteristics associated with medication-placebo differences in antidepressant trials, few consistent findings have emerged to guide participant selection in drug development settings and differential therapeutics in clinical practice. Limitations in the methodologies used, particularly searching for a single moderator while treating all other variables as noise, may partially explain the failure to generate consistent results. The present study tested whether interactions between pretreatment patient characteristics, rather than a single-variable solution, may better predict who is most likely to benefit from placebo versus medication. Data were analyzed from 174 patients aged 75 years and older with unipolar depression who were randomly assigned to citalopram or placebo. Model-based recursive partitioning analysis was conducted to identify the most robust significant moderators of placebo versus citalopram response. The greatest signal detection between medication and placebo in favor of medication was among patients with fewer years of education (≤12) who suffered from a longer duration of depression since their first episode (>3.47 years) (B = 2.53, t(32) = 3.01, p = 0.004). Compared with medication, placebo had the greatest response for those who were more educated (>12 years), to the point where placebo almost outperformed medication (B = -0.57, t(96) = -1.90, p = 0.06). Machine learning approaches capable of evaluating the contributions of multiple predictor variables may be a promising methodology for identifying placebo versus medication responders. Duration of depression and education should be considered in the efforts to modulate placebo magnitude in drug development settings and in clinical practice. Copyright © 2018 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.

  16. Optimization of classification and regression analysis of four monoclonal antibodies from Raman spectra using collaborative machine learning approach.

    Science.gov (United States)

    Le, Laetitia Minh Maï; Kégl, Balázs; Gramfort, Alexandre; Marini, Camille; Nguyen, David; Cherti, Mehdi; Tfaili, Sana; Tfayli, Ali; Baillet-Guffroy, Arlette; Prognon, Patrice; Chaminade, Pierre; Caudron, Eric

    2018-07-01

    The use of monoclonal antibodies (mAbs) constitutes one of the most important strategies to treat patients suffering from cancers such as hematological malignancies and solid tumors. These antibodies are prescribed by the physician and prepared by hospital pharmacists. An analytical control enables the quality of the preparations to be ensured. The aim of this study was to explore the development of a rapid analytical method for quality control. The method used four mAbs (Infliximab, Bevacizumab, Rituximab and Ramucirumab) at various concentrations and was based on recording Raman data and coupling them to a traditional chemometric and machine learning approach for data analysis. Compared to conventional linear approach, prediction errors are reduced with a data-driven approach using statistical machine learning methods. In the latter, preprocessing and predictive models are jointly optimized. An additional original aspect of the work involved on submitting the problem to a collaborative data challenge platform called Rapid Analytics and Model Prototyping (RAMP). This allowed using solutions from about 300 data scientists in collaborative work. Using machine learning, the prediction of the four mAbs samples was considerably improved. The best predictive model showed a combined error of 2.4% versus 14.6% using linear approach. The concentration and classification errors were 5.8% and 0.7%, only three spectra were misclassified over the 429 spectra of the test set. This large improvement obtained with machine learning techniques was uniform for all molecules but maximal for Bevacizumab with an 88.3% reduction on combined errors (2.1% versus 17.9%). Copyright © 2018 Elsevier B.V. All rights reserved.

  17. The Scenario Approach to the Development of Strategy of Prevention of Raider Seizure for Machine-Building Enterprise

    Directory of Open Access Journals (Sweden)

    Momot Tetiana V.

    2017-12-01

    Full Text Available The article proposes the methodical approach to the choice and substantiation of efficiency of managerial decisions on ensuring economic safety at counteraction of raiding, based on an intellectual instrumental analysis. The ranking of alternatives of managerial decisions on the basis of the received weighted estimates and their fuzzy composition is used. A graphical interpretation of the membership functions of the calculated fuzzy expected utilities of management alternatives for the machine-building enterprises has been constructed and is presented.

  18. Investigation of High-Speed Cryogenic Machining Based on Finite Element Approach

    Directory of Open Access Journals (Sweden)

    Pooyan Vahidi Pashaki

    Full Text Available Abstract The simulation of cryogenic machining process because of using a three-dimensional model and high process duration time in the finite element method, have been studied rarely. In this study, to overcome this limitation, a 2.5D finite element model using the commercial finite element software ABAQUS has been developed for the cryogenic machining process and by considering more realistic assumptions, the chip formation procedure investigated. In the proposed method, the liquid nitrogen has been used as a coolant. At the modeling of friction during the interaction of tools - chip, the Coulomb law has been used. In order to simulate the behavior of plasticity and failure criterion, Johnson-Cook model was used, and unlike previous investigations, thermal and mechanical properties of materials as a function of temperature were applied to the software. After examining accuracy of the model with present experimental data, the effect of parameters such as rake angle and the cutting speed as well as dry machining of aluminum alloy by the use of coupled dynamic temperature solution has been studied. Results indicated that at the cutting velocity of 10 m/s, cryogenic cooling has caused into decreasing 60 percent of tools temperature in comparison with the dry cooling. Furthermore, a chip which has been made by cryogenic machining were connected and without fracture in contrast to dry machining.

  19. Virtual screening by a new Clustering-based Weighted Similarity Extreme Learning Machine approach.

    Science.gov (United States)

    Pasupa, Kitsuchart; Kudisthalert, Wasu

    2018-01-01

    Machine learning techniques are becoming popular in virtual screening tasks. One of the powerful machine learning algorithms is Extreme Learning Machine (ELM) which has been applied to many applications and has recently been applied to virtual screening. We propose the Weighted Similarity ELM (WS-ELM) which is based on a single layer feed-forward neural network in a conjunction of 16 different similarity coefficients as activation function in the hidden layer. It is known that the performance of conventional ELM is not robust due to random weight selection in the hidden layer. Thus, we propose a Clustering-based WS-ELM (CWS-ELM) that deterministically assigns weights by utilising clustering algorithms i.e. k-means clustering and support vector clustering. The experiments were conducted on one of the most challenging datasets-Maximum Unbiased Validation Dataset-which contains 17 activity classes carefully selected from PubChem. The proposed algorithms were then compared with other machine learning techniques such as support vector machine, random forest, and similarity searching. The results show that CWS-ELM in conjunction with support vector clustering yields the best performance when utilised together with Sokal/Sneath(1) coefficient. Furthermore, ECFP_6 fingerprint presents the best results in our framework compared to the other types of fingerprints, namely ECFP_4, FCFP_4, and FCFP_6.

  20. The reflection of evolving bearing faults in the stator current's extended park vector approach for induction machines

    Science.gov (United States)

    Corne, Bram; Vervisch, Bram; Derammelaere, Stijn; Knockaert, Jos; Desmet, Jan

    2018-07-01

    Stator current analysis has the potential of becoming the most cost-effective condition monitoring technology regarding electric rotating machinery. Since both electrical and mechanical faults are detected by inexpensive and robust current-sensors, measuring current is advantageous on other techniques such as vibration, acoustic or temperature analysis. However, this technology is struggling to breach into the market of condition monitoring as the electrical interpretation of mechanical machine-problems is highly complicated. Recently, the authors built a test-rig which facilitates the emulation of several representative mechanical faults on an 11 kW induction machine with high accuracy and reproducibility. Operating this test-rig, the stator current of the induction machine under test can be analyzed while mechanical faults are emulated. Furthermore, while emulating, the fault-severity can be manipulated adaptively under controllable environmental conditions. This creates the opportunity of examining the relation between the magnitude of the well-known current fault components and the corresponding fault-severity. This paper presents the emulation of evolving bearing faults and their reflection in the Extended Park Vector Approach for the 11 kW induction machine under test. The results confirm the strong relation between the bearing faults and the stator current fault components in both identification and fault-severity. Conclusively, stator current analysis increases reliability in the application as a complete, robust, on-line condition monitoring technology.

  1. Numerical approach for optimum electromagnetic parameters of electrical machines used in vehicle traction applications

    International Nuclear Information System (INIS)

    Fodorean, D.; Giurgea, S.; Djerdir, A.; Miraoui, A.

    2009-01-01

    A large speed variation is an essential request in the automobile industry. In order to compete with diesel engines, the flux weakening technique has to be employed on the electrical machines. In this way, appropriate electromagnetic and geometrical parameters can give the desired speed. Using the inverse problem method coupled with numerical analysis by finite element method (FEM), the authors propose an optimum parameters configuration that maximizes the speed domain operation. Several types of electrical machines are under study: induction, synchronous permanent magnet, variable reluctance and transverse flux machines, respectively. With a proper non-linear model, by using analytical and numerical calculation, the authors propose an optimum solution for the speed variation of the studied drives, which will be standing for a final comparison.

  2. A FLEXIBLE APPROACH TO THE BENEFICIAL USE OF MACHINING POWER IN MILLING PROCESS

    Directory of Open Access Journals (Sweden)

    Faruk MENDİ

    2000-01-01

    Full Text Available In this study, a computer program has been developed to calculate the cutting speed (Vc and depth of cut (a under certain cutting condition which will be determined by the user for making maximum use of the various milling machines used presently. In the scope of the study, data related to the cutting speed (Vc and depth of cut (a of St 37 Steel and powers of the machine have been obtained. The aim is to make maximum use of the machines by choosing cutting speed (Vc and depth of cut (a with the help data. The effect of changes in cutting speed (Vc and depth of cut (a on productivity has been studied.

  3. Self-commissioning of permanent magnet synchronous machine drives using hybrid approach

    DEFF Research Database (Denmark)

    Basar, Mehmet Sertug

    2016-01-01

    Self-commissioning of permanent-magnet (PM) synchronous machines (PMSMs) is of prime importance in an industrial drive system because control performance and system stability depend heavily on the accurate machine parameter information. This article focuses on a combination of offline and online...... parameter estimation for a non-salient pole PMSM which eliminates the need for any prior knowledge on machine parameters. Stator resistance and inductance are first identified at standstill utilising fundamental and high-frequency excitation signals, respectively. A novel method has been developed...... and employed for inductance estimation. Then, stator resistance, inductance and PM flux are updated online using a recursive least-squares (RLS) algorithm. The proposed controllers are designed using MATLAB/Simulink® and implemented on d-Space® real-time system incorporating a commercially available PMSM drive....

  4. Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

    Science.gov (United States)

    Heidari, Morteza; Zargari Khuzani, Abolfazl; Hollingsworth, Alan B.; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qiu, Yuchen; Liu, Hong; Zheng, Bin

    2018-02-01

    In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.

  5. Vibration Monitoring of Gas Turbine Engines: Machine-Learning Approaches and Their Challenges

    Directory of Open Access Journals (Sweden)

    Ioannis Matthaiou

    2017-09-01

    Full Text Available In this study, condition monitoring strategies are examined for gas turbine engines using vibration data. The focus is on data-driven approaches, for this reason a novelty detection framework is considered for the development of reliable data-driven models that can describe the underlying relationships of the processes taking place during an engine’s operation. From a data analysis perspective, the high dimensionality of features extracted and the data complexity are two problems that need to be dealt with throughout analyses of this type. The latter refers to the fact that the healthy engine state data can be non-stationary. To address this, the implementation of the wavelet transform is examined to get a set of features from vibration signals that describe the non-stationary parts. The problem of high dimensionality of the features is addressed by “compressing” them using the kernel principal component analysis so that more meaningful, lower-dimensional features can be used to train the pattern recognition algorithms. For feature discrimination, a novelty detection scheme that is based on the one-class support vector machine (OCSVM algorithm is chosen for investigation. The main advantage, when compared to other pattern recognition algorithms, is that the learning problem is being cast as a quadratic program. The developed condition monitoring strategy can be applied for detecting excessive vibration levels that can lead to engine component failure. Here, we demonstrate its performance on vibration data from an experimental gas turbine engine operating on different conditions. Engine vibration data that are designated as belonging to the engine’s “normal” condition correspond to fuels and air-to-fuel ratio combinations, in which the engine experienced low levels of vibration. Results demonstrate that such novelty detection schemes can achieve a satisfactory validation accuracy through appropriate selection of two parameters of the

  6. An empirical comparison of different approaches for combining multimodal neuroimaging data with support vector machine.

    Science.gov (United States)

    Pettersson-Yeo, William; Benetti, Stefania; Marquand, Andre F; Joules, Richard; Catani, Marco; Williams, Steve C R; Allen, Paul; McGuire, Philip; Mechelli, Andrea

    2014-01-01

    In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification accuracies must be improved for such utility to be realized. One possible solution is to integrate different data types to provide a single combined output classification; either by generating a single decision function based on an integrated kernel matrix, or, by creating an ensemble of multiple single modality classifiers and integrating their predictions. Here, we describe four integrative approaches: (1) an un-weighted sum of kernels, (2) multi-kernel learning, (3) prediction averaging, and (4) majority voting, and compare their ability to enhance classification accuracy relative to the best single-modality classification accuracy. We achieve this by integrating structural, functional, and diffusion tensor magnetic resonance imaging data, in order to compare ultra-high risk (n = 19), first episode psychosis (n = 19) and healthy control subjects (n = 23). Our results show that (i) whilst integration can enhance classification accuracy by up to 13%, the frequency of such instances may be limited, (ii) where classification can be enhanced, simple methods may yield greater increases relative to more computationally complex alternatives, and, (iii) the potential for classification enhancement is highly influenced by the specific diagnostic comparison under consideration. In conclusion, our findings suggest that for moderately sized clinical neuroimaging datasets, combining different imaging modalities in a data-driven manner is no "magic bullet" for increasing classification accuracy. However, it remains possible that this conclusion is dependent on the use of neuroimaging modalities that had little, or no, complementary information to offer one another, and that the

  7. Efficient approach to simulate EM loads on massive structures in ITER machine

    Energy Technology Data Exchange (ETDEWEB)

    Alekseev, A. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Andreeva, Z.; Belov, A.; Belyakov, V.; Filatov, O. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Gribov, Yu.; Ioki, K. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Kukhtin, V.; Labusov, A.; Lamzin, E.; Lyublin, B.; Malkov, A.; Mazul, I. [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation); Rozov, V.; Sugihara, M. [ITER Organization, Route de Vinon sur Verdon, 13115 St. Paul-Lez-Durance (France); Sychevsky, S., E-mail: sytch@sintez.niiefa.spb.su [D.V. Efremov Scientific Research Institute, 196641 St. Petersburg (Russian Federation)

    2013-10-15

    Highlights: ► A modelling technique to predict EM loads in ITER conducting structures is presented. ► The technique provides low computational cost and parallel computations. ► Detailed models were built for the system “vacuum vessel, cryostat, thermal shields”. ► EM loads on massive in-vessel structures were simulated with the use of local models. ► A flexible combination of models enables desired accuracy of load distributions. -- Abstract: Operation of the ITER machine is associated with high electromagnetic (EM) loads. An essential contributor to EM loads is eddy currents induced in passive conductive structures. Reasoning from the ITER construction, a modelling technique has been developed and applied in computations to efficiently predict anticipated loads. The technique allows us to avoid building a global 3D finite-element (FE) model that requires meshing of the conducting structures and their vacuum environment into 3D solid elements that leads to high computational cost. The key features of the proposed technique are: (i) the use of an existing shell model for the system “vacuum vessel (VV), cryostat, and thermal shields (TS)” implementing the magnetic shell approach. A solution is obtained in terms of a single-component, in this case, vector electric potential taken within the conducting shells of the “VV + cryostat + TS” system. (ii) EM loads on in-vessel conducting structures are simulated with the use of local FE models. The local models use either the 3D solid body or shell approximations. Reasoning from the simulation efficiency, the local boundary conditions are put with respect to the total field or an external field. The use of an integral-differential formulation and special procedures ensures smooth and accurate simulated distributions of fields from current sources of any geometry. The local FE models have been developed and applied for EM analyses of a variety of the ITER components including the diagnostic systems

  8. An information-theoretic machine learning approach to expression QTL analysis.

    Directory of Open Access Journals (Sweden)

    Tao Huang

    Full Text Available Expression Quantitative Trait Locus (eQTL analysis is a powerful tool to study the biological mechanisms linking the genotype with gene expression. Such analyses can identify genomic locations where genotypic variants influence the expression of genes, both in close proximity to the variant (cis-eQTL, and on other chromosomes (trans-eQTL. Many traditional eQTL methods are based on a linear regression model. In this study, we propose a novel method by which to identify eQTL associations with information theory and machine learning approaches. Mutual Information (MI is used to describe the association between genetic marker and gene expression. MI can detect both linear and non-linear associations. What's more, it can capture the heterogeneity of the population. Advanced feature selection methods, Maximum Relevance Minimum Redundancy (mRMR and Incremental Feature Selection (IFS, were applied to optimize the selection of the affected genes by the genetic marker. When we applied our method to a study of apoE-deficient mice, it was found that the cis-acting eQTLs are stronger than trans-acting eQTLs but there are more trans-acting eQTLs than cis-acting eQTLs. We compared our results (mRMR.eQTL with R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA. In female mice, 67.9% of mRMR.eQTL results can be confirmed by at least two other methods while only 14.4% of R/qtl result can be confirmed by at least two other methods. In male mice, 74.1% of mRMR.eQTL results can be confirmed by at least two other methods while only 18.2% of R/qtl result can be confirmed by at least two other methods. Our methods provide a new way to identify the association between genetic markers and gene expression. Our software is available from supporting information.

  9. Support Vector Machines with Manifold Learning and Probabilistic Space Projection for Tourist Expenditure Analysis

    Directory of Open Access Journals (Sweden)

    Xin Xu

    2009-03-01

    Full Text Available The significant economic contributions of the tourism industry in recent years impose an unprecedented force for data mining and machine learning methods to analyze tourism data. The intrinsic problems of raw data in tourism are largely related to the complexity, noise and nonlinearity in the data that may introduce many challenges for the existing data mining techniques such as rough sets and neural networks. In this paper, a novel method using SVM- based classification with two nonlinear feature projection techniques is proposed for tourism data analysis. The first feature projection method is based on ISOMAP (Isometric Feature Mapping, which is a class of manifold learning approaches for dimension reduction. By making use of ISOMAP, part of the noisy data can be identified and the classification accuracy of SVMs can be improved by appropriately discarding the noisy training data. The second feature projection method is a probabilistic space mapping technique for scale transformation. Experimental results on expenditure data of business travelers show that the proposed method can improve prediction performance both in terms of testing accuracy and statistical coincidence. In addition, both of the feature projection methods are helpful to reduce the training time of SVMs.

  10. Fault Diagnosis of a Reconfigurable Crawling–Rolling Robot Based on Support Vector Machines

    Directory of Open Access Journals (Sweden)

    Karthikeyan Elangovan

    2017-10-01

    Full Text Available As robots begin to perform jobs autonomously, with minimal or no human intervention, a new challenge arises: robots also need to autonomously detect errors and recover from faults. In this paper, we present a Support Vector Machine (SVM-based fault diagnosis system for a bio-inspired reconfigurable robot named Scorpio. The diagnosis system needs to detect and classify faults while Scorpio uses its crawling and rolling locomotion modes. Specifically, we classify between faulty and non-faulty conditions by analyzing onboard Inertial Measurement Unit (IMU sensor data. The data capture nine different locomotion gaits, which include rolling and crawling modes, at three different speeds. Statistical methods are applied to extract features and to reduce the dimensionality of original IMU sensor data features. These statistical features were given as inputs for training and testing. Additionally, the c-Support Vector Classification (c-SVC and nu-SVC models of SVM, and their fault classification accuracies, were compared. The results show that the proposed SVM approach can be used to autonomously diagnose locomotion gait faults while the reconfigurable robot is in operation.

  11. Detecting epileptic seizure with different feature extracting strategies using robust machine learning classification techniques by applying advance parameter optimization approach.

    Science.gov (United States)

    Hussain, Lal

    2018-06-01

    Epilepsy is a neurological disorder produced due to abnormal excitability of neurons in the brain. The research reveals that brain activity is monitored through electroencephalogram (EEG) of patients suffered from seizure to detect the epileptic seizure. The performance of EEG detection based epilepsy require feature extracting strategies. In this research, we have extracted varying features extracting strategies based on time and frequency domain characteristics, nonlinear, wavelet based entropy and few statistical features. A deeper study was undertaken using novel machine learning classifiers by considering multiple factors. The support vector machine kernels are evaluated based on multiclass kernel and box constraint level. Likewise, for K-nearest neighbors (KNN), we computed the different distance metrics, Neighbor weights and Neighbors. Similarly, the decision trees we tuned the paramours based on maximum splits and split criteria and ensemble classifiers are evaluated based on different ensemble methods and learning rate. For training/testing tenfold Cross validation was employed and performance was evaluated in form of TPR, NPR, PPV, accuracy and AUC. In this research, a deeper analysis approach was performed using diverse features extracting strategies using robust machine learning classifiers with more advanced optimal options. Support Vector Machine linear kernel and KNN with City block distance metric give the overall highest accuracy of 99.5% which was higher than using the default parameters for these classifiers. Moreover, highest separation (AUC = 0.9991, 0.9990) were obtained at different kernel scales using SVM. Additionally, the K-nearest neighbors with inverse squared distance weight give higher performance at different Neighbors. Moreover, to distinguish the postictal heart rate oscillations from epileptic ictal subjects, and highest performance of 100% was obtained using different machine learning classifiers.

  12. A modern approach for the realisation of the man-machine information system

    International Nuclear Information System (INIS)

    Martin, D.; Grensemann, D.

    1980-01-01

    The tool for man-machine communication has been given the name VISCOMP-VISual COmmunication Man-Process. Dependent on the project implementation phase VISCOMP functions either autonomously (planning and definition phases) or in conjunction with a data acquisition system during the operational phase on the plant. The relationship of the required components to the project phases is illustrated. (orig./HP)

  13. SVM-Maj: a majorization approach to linear support vector machines with different hinge errors

    NARCIS (Netherlands)

    P.J.F. Groenen (Patrick); G.I. Nalbantov (Georgi); J.C. Bioch (Cor)

    2007-01-01

    textabstractSupport vector machines (SVM) are becoming increasingly popular for the prediction of a binary dependent variable. SVMs perform very well with respect to competing techniques. Often, the solution of an SVM is obtained by switching to the dual. In this paper, we stick to the primal

  14. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

    NARCIS (Netherlands)

    Melo, Rita; Fieldhouse, Robert; Melo, André; Correia, João D G; Cordeiro, Maria Natália D S; Gümüş, Zeynep H; Costa, Joaquim; Bonvin, Alexandre M J J; de Sousa Moreira, Irina

    2016-01-01

    Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model

  15. An empirical comparison of different approaches for combining multimodal neuroimaging data with support vector machine

    NARCIS (Netherlands)

    Pettersson-Yeo, W.; Benetti, S.; Marquand, A.F.; Joules, R.; Catani, M.; Williams, S.C.; Allen, P.; McGuire, P.; Mechelli, A.

    2014-01-01

    In the pursuit of clinical utility, neuroimaging researchers of psychiatric and neurological illness are increasingly using analyses, such as support vector machine, that allow inference at the single-subject level. Recent studies employing single-modality data, however, suggest that classification

  16. Reliability Evaluation and Improvement Approach of Chemical Production Man - Machine - Environment System

    Science.gov (United States)

    Miao, Yongchun; Kang, Rongxue; Chen, Xuefeng

    2017-12-01

    In recent years, with the gradual extension of reliability research, the study of production system reliability has become the hot topic in various industries. Man-machine-environment system is a complex system composed of human factors, machinery equipment and environment. The reliability of individual factor must be analyzed in order to gradually transit to the research of three-factor reliability. Meanwhile, the dynamic relationship among man-machine-environment should be considered to establish an effective blurry evaluation mechanism to truly and effectively analyze the reliability of such systems. In this paper, based on the system engineering, fuzzy theory, reliability theory, human error, environmental impact and machinery equipment failure theory, the reliabilities of human factor, machinery equipment and environment of some chemical production system were studied by the method of fuzzy evaluation. At last, the reliability of man-machine-environment system was calculated to obtain the weighted result, which indicated that the reliability value of this chemical production system was 86.29. Through the given evaluation domain it can be seen that the reliability of man-machine-environment integrated system is in a good status, and the effective measures for further improvement were proposed according to the fuzzy calculation results.

  17. Introduction to special issue on machine learning approaches to shallow parsing

    NARCIS (Netherlands)

    Hammerton, J; Osborne, M; Armstrong, S; Daelemans, W

    2002-01-01

    This article introduces the problem of partial or shallow parsing (assigning partial syntactic structure to sentences) and explains why it is an important natural language processing (NLP) task. The complexity of the task makes Machine Learning an attractive option in comparison to the handcrafting

  18. A Support Vector Machine Approach to Dutch Part-of-Speech Tagging

    NARCIS (Netherlands)

    Poel, Mannes; Stegeman, L.; op den Akker, Hendrikus J.A.; Berthold, M.R.; Shawe-Taylor, J.; Lavrac, N.

    Part-of-Speech tagging, the assignment of Parts-of-Speech to the words in a given context of use, is a basic technique in many systems that handle natural languages. This paper describes a method for supervised training of a Part-of-Speech tagger using a committee of Support Vector Machines on a

  19. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization – Extreme learning machine approach

    International Nuclear Information System (INIS)

    Salcedo-Sanz, S.; Pastor-Sánchez, A.; Prieto, L.; Blanco-Aguilera, A.; García-Herrera, R.

    2014-01-01

    Highlights: • A novel approach for short-term wind speed prediction is presented. • The system is formed by a coral reefs optimization algorithm and an extreme learning machine. • Feature selection is carried out with the CRO to improve the ELM performance. • The method is tested in real wind farm data in USA, for the period 2007–2008. - Abstract: This paper presents a novel approach for short-term wind speed prediction based on a Coral Reefs Optimization algorithm (CRO) and an Extreme Learning Machine (ELM), using meteorological predictive variables from a physical model (the Weather Research and Forecast model, WRF). The approach is based on a Feature Selection Problem (FSP) carried out with the CRO, that must obtain a reduced number of predictive variables out of the total available from the WRF. This set of features will be the input of an ELM, that finally provides the wind speed prediction. The CRO is a novel bio-inspired approach, based on the simulation of reef formation and coral reproduction, able to obtain excellent results in optimization problems. On the other hand, the ELM is a new paradigm in neural networks’ training, that provides a robust and extremely fast training of the network. Together, these algorithms are able to successfully solve this problem of feature selection in short-term wind speed prediction. Experiments in a real wind farm in the USA show the excellent performance of the CRO–ELM approach in this FSP wind speed prediction problem

  20. Heparan Sulfate Induces Necroptosis in Murine Cardiomyocytes: A Medical-In silico Approach Combining In vitro Experiments and Machine Learning.

    Science.gov (United States)

    Zechendorf, Elisabeth; Vaßen, Phillip; Zhang, Jieyi; Hallawa, Ahmed; Martincuks, Antons; Krenkel, Oliver; Müller-Newen, Gerhard; Schuerholz, Tobias; Simon, Tim-Philipp; Marx, Gernot; Ascheid, Gerd; Schmeink, Anke; Dartmann, Guido; Thiemermann, Christoph; Martin, Lukas

    2018-01-01

    Life-threatening cardiomyopathy is a severe, but common, complication associated with severe trauma or sepsis. Several signaling pathways involved in apoptosis and necroptosis are linked to trauma- or sepsis-associated cardiomyopathy. However, the underling causative factors are still debatable. Heparan sulfate (HS) fragments belong to the class of danger/damage-associated molecular patterns liberated from endothelial-bound proteoglycans by heparanase during tissue injury associated with trauma or sepsis. We hypothesized that HS induces apoptosis or necroptosis in murine cardiomyocytes. By using a novel Medical- In silico approach that combines conventional cell culture experiments with machine learning algorithms, we aimed to reduce a significant part of the expensive and time-consuming cell culture experiments and data generation by using computational intelligence (refinement and replacement). Cardiomyocytes exposed to HS showed an activation of the intrinsic apoptosis signal pathway via cytochrome C and the activation of caspase 3 (both p  machine learning algorithms.

  1. Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach.

    Science.gov (United States)

    Yamamoto, Yoichiro; Saito, Akira; Tateishi, Ayako; Shimojo, Hisashi; Kanno, Hiroyuki; Tsuchiya, Shinichi; Ito, Ken-Ichi; Cosatto, Eric; Graf, Hans Peter; Moraleda, Rodrigo R; Eils, Roland; Grabe, Niels

    2017-04-25

    Machine learning systems have recently received increased attention for their broad applications in several fields. In this study, we show for the first time that histological types of breast tumors can be classified using subtle morphological differences of microenvironmental myoepithelial cell nuclei without any direct information about neoplastic tumor cells. We quantitatively measured 11661 nuclei on the four histological types: normal cases, usual ductal hyperplasia and low/high grade ductal carcinoma in situ (DCIS). Using a machine learning system, we succeeded in classifying the four histological types with 90.9% accuracy. Electron microscopy observations suggested that the activity of typical myoepithelial cells in DCIS was lowered. Through these observations as well as meta-analytic database analyses, we developed a paracrine cross-talk-based biological mechanism of DCIS progressing to invasive cancer. Our observations support novel approaches in clinical computational diagnostics as well as in therapy development against progression.

  2. Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: a machine learning approach.

    Science.gov (United States)

    Fraccaro, Paolo; Nicolo, Massimo; Bonetto, Monica; Giacomini, Mauro; Weller, Peter; Traverso, Carlo Enrico; Prosperi, Mattia; OSullivan, Dympna

    2015-01-27

    To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD). Data from healthy subjects and patients diagnosed with AMD or other retinal diseases were collected during routine visits via an Electronic Health Record (EHR) system. Patients' attributes included demographics and, for each eye, presence/absence of major AMD-related clinical signs (soft drusen, retinal pigment epitelium, defects/pigment mottling, depigmentation area, subretinal haemorrhage, subretinal fluid, macula thickness, macular scar, subretinal fibrosis). Interpretable techniques known as white box methods including logistic regression and decision trees as well as less interpreitable techniques known as black box methods, such as support vector machines (SVM), random forests and AdaBoost, were used to develop models (trained and validated on unseen data) to diagnose AMD. The gold standard was confirmed diagnosis of AMD by physicians. Sensitivity, specificity and area under the receiver operating characteristic (AUC) were used to assess performance. Study population included 487 patients (912 eyes). In terms of AUC, random forests, logistic regression and adaboost showed a mean performance of (0.92), followed by SVM and decision trees (0.90). All machine learning models identified soft drusen and age as the most discriminating variables in clinicians' decision pathways to diagnose AMD. Both black-box and white box methods performed well in identifying diagnoses of AMD and their decision pathways. Machine learning models developed through the proposed approach, relying on clinical signs identified by retinal specialists, could be embedded into EHR to provide physicians with real time (interpretable) support.

  3. Machine Learning Approach to Deconvolution of Thermal Infrared (TIR) Spectrum of Mercury Supporting MERTIS Onboard ESA/JAXA BepiColombo

    Science.gov (United States)

    Varatharajan, I.; D'Amore, M.; Maturilli, A.; Helbert, J.; Hiesinger, H.

    2018-04-01

    Machine learning approach to spectral unmixing of emissivity spectra of Mercury is carried out using endmember spectral library measured at simulated daytime surface conditions of Mercury. Study supports MERTIS payload onboard ESA/JAXA BepiColombo.

  4. Machine Learning

    Energy Technology Data Exchange (ETDEWEB)

    Chikkagoudar, Satish; Chatterjee, Samrat; Thomas, Dennis G.; Carroll, Thomas E.; Muller, George

    2017-04-21

    The absence of a robust and unified theory of cyber dynamics presents challenges and opportunities for using machine learning based data-driven approaches to further the understanding of the behavior of such complex systems. Analysts can also use machine learning approaches to gain operational insights. In order to be operationally beneficial, cybersecurity machine learning based models need to have the ability to: (1) represent a real-world system, (2) infer system properties, and (3) learn and adapt based on expert knowledge and observations. Probabilistic models and Probabilistic graphical models provide these necessary properties and are further explored in this chapter. Bayesian Networks and Hidden Markov Models are introduced as an example of a widely used data driven classification/modeling strategy.

  5. PredPsych: A toolbox for predictive machine learning based approach in experimental psychology research

    OpenAIRE

    Cavallo, Andrea; Becchio, Cristina; Koul, Atesh

    2016-01-01

    Recent years have seen an increased interest in machine learning based predictive methods for analysing quantitative behavioural data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible software framework. The goal of this work was to build an open-source toolbox – “PredPsych” – that could make these methods readily available to all psychologists. PredPsych is a...

  6. Poster abstract: A machine learning approach for vehicle classification using passive infrared and ultrasonic sensors

    KAUST Repository

    Warriach, Ehsan Ullah

    2013-01-01

    This article describes the implementation of four different machine learning techniques for vehicle classification in a dual ultrasonic/passive infrared traffic flow sensors. Using k-NN, Naive Bayes, SVM and KNN-SVM algorithms, we show that KNN-SVM significantly outperforms other algorithms in terms of classification accuracy. We also show that some of these algorithms could run in real time on the prototype system. Copyright © 2013 ACM.

  7. Machine Learning Approach to Optimizing Combined Stimulation and Medication Therapies for Parkinson's Disease.

    Science.gov (United States)

    Shamir, Reuben R; Dolber, Trygve; Noecker, Angela M; Walter, Benjamin L; McIntyre, Cameron C

    2015-01-01

    Deep brain stimulation (DBS) of the subthalamic region is an established therapy for advanced Parkinson's disease (PD). However, patients often require time-intensive post-operative management to balance their coupled stimulation and medication treatments. Given the large and complex parameter space associated with this task, we propose that clinical decision support systems (CDSS) based on machine learning algorithms could assist in treatment optimization. Develop a proof-of-concept implementation of a CDSS that incorporates patient-specific details on both stimulation and medication. Clinical data from 10 patients, and 89 post-DBS surgery visits, were used to create a prototype CDSS. The system was designed to provide three key functions: (1) information retrieval; (2) visualization of treatment, and; (3) recommendation on expected effective stimulation and drug dosages, based on three machine learning methods that included support vector machines, Naïve Bayes, and random forest. Measures of medication dosages, time factors, and symptom-specific pre-operative response to levodopa were significantly correlated with post-operative outcomes (P < 0.05) and their effect on outcomes was of similar magnitude to that of DBS. Using those results, the combined machine learning algorithms were able to accurately predict 86% (12/14) of the motor improvement scores at one year after surgery. Using patient-specific details, an appropriately parameterized CDSS could help select theoretically optimal DBS parameter settings and medication dosages that have potential to improve the clinical management of PD patients. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective.

    Science.gov (United States)

    Kang, John; Schwartz, Russell; Flickinger, John; Beriwal, Sushil

    2015-12-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, "spam" filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the "barrier to entry" for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods--logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)--and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Trip time prediction in mass transit companies. A machine learning approach

    OpenAIRE

    João M. Moreira; Alípio Jorge; Jorge Freire de Sousa; Carlos Soares

    2005-01-01

    In this paper we discuss how trip time prediction can be useful foroperational optimization in mass transit companies and which machine learningtechniques can be used to improve results. Firstly, we analyze which departmentsneed trip time prediction and when. Secondly, we review related work and thirdlywe present the analysis of trip time over a particular path. We proceed by presentingexperimental results conducted on real data with the forecasting techniques wefound most adequate, and concl...

  10. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    International Nuclear Information System (INIS)

    Kang, John; Schwartz, Russell; Flickinger, John; Beriwal, Sushil

    2015-01-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.

  11. Machine Learning Approaches for Predicting Radiation Therapy Outcomes: A Clinician's Perspective

    Energy Technology Data Exchange (ETDEWEB)

    Kang, John [Medical Scientist Training Program, University of Pittsburgh-Carnegie Mellon University, Pittsburgh, Pennsylvania (United States); Schwartz, Russell [Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania (United States); Flickinger, John [Departments of Radiation Oncology and Neurological Surgery, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania (United States); Beriwal, Sushil, E-mail: beriwals@upmc.edu [Department of Radiation Oncology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania (United States)

    2015-12-01

    Radiation oncology has always been deeply rooted in modeling, from the early days of isoeffect curves to the contemporary Quantitative Analysis of Normal Tissue Effects in the Clinic (QUANTEC) initiative. In recent years, medical modeling for both prognostic and therapeutic purposes has exploded thanks to increasing availability of electronic data and genomics. One promising direction that medical modeling is moving toward is adopting the same machine learning methods used by companies such as Google and Facebook to combat disease. Broadly defined, machine learning is a branch of computer science that deals with making predictions from complex data through statistical models. These methods serve to uncover patterns in data and are actively used in areas such as speech recognition, handwriting recognition, face recognition, “spam” filtering (junk email), and targeted advertising. Although multiple radiation oncology research groups have shown the value of applied machine learning (ML), clinical adoption has been slow due to the high barrier to understanding these complex models by clinicians. Here, we present a review of the use of ML to predict radiation therapy outcomes from the clinician's point of view with the hope that it lowers the “barrier to entry” for those without formal training in ML. We begin by describing 7 principles that one should consider when evaluating (or creating) an ML model in radiation oncology. We next introduce 3 popular ML methods—logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN)—and critique 3 seminal papers in the context of these principles. Although current studies are in exploratory stages, the overall methodology has progressively matured, and the field is ready for larger-scale further investigation.

  12. A Machine-Learning Approach to Predict Main Energy Consumption under Realistic Operational Conditions

    DEFF Research Database (Denmark)

    Petersen, Joan P; Winther, Ole; Jacobsen, Daniel J

    2012-01-01

    The paper presents a novel and publicly available set of high-quality sensory data collected from a ferry over a period of two months and overviews exixting machine-learning methods for the prediction of main propulsion efficiency. Neural networks are applied on both real-time and predictive...... settings. Performance results for the real-time models are shown. The presented models were successfully developed in a trim optimisation application onboard a product tanker....

  13. Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies

    OpenAIRE

    Dadvar, M.; Trieschnigg, Rudolf Berend; de Jong, Franciska M.G.

    2014-01-01

    Cyberbullying is becoming a major concern in online environments with troubling consequences. However, most of the technical studies have focused on the detection of cyberbullying through identifying harassing comments rather than preventing the incidents by detecting the bullies. In this work we study the automatic detection of bully users on YouTube. We compare three types of automatic detection: an expert system, supervised machine learning models, and a hybrid type combining the two. All ...

  14. PredPsych: A toolbox for predictive machine learning-based approach in experimental psychology research.

    Science.gov (United States)

    Koul, Atesh; Becchio, Cristina; Cavallo, Andrea

    2017-12-12

    Recent years have seen an increased interest in machine learning-based predictive methods for analyzing quantitative behavioral data in experimental psychology. While these methods can achieve relatively greater sensitivity compared to conventional univariate techniques, they still lack an established and accessible implementation. The aim of current work was to build an open-source R toolbox - "PredPsych" - that could make these methods readily available to all psychologists. PredPsych is a user-friendly, R toolbox based on machine-learning predictive algorithms. In this paper, we present the framework of PredPsych via the analysis of a recently published multiple-subject motion capture dataset. In addition, we discuss examples of possible research questions that can be addressed with the machine-learning algorithms implemented in PredPsych and cannot be easily addressed with univariate statistical analysis. We anticipate that PredPsych will be of use to researchers with limited programming experience not only in the field of psychology, but also in that of clinical neuroscience, enabling computational assessment of putative bio-behavioral markers for both prognosis and diagnosis.

  15. Explosion Monitoring with Machine Learning: A LSTM Approach to Seismic Event Discrimination

    Science.gov (United States)

    Magana-Zook, S. A.; Ruppert, S. D.

    2017-12-01

    The streams of seismic data that analysts look at to discriminate natural from man- made events will soon grow from gigabytes of data per day to exponentially larger rates. This is an interesting problem as the requirement for real-time answers to questions of non-proliferation will remain the same, and the analyst pool cannot grow as fast as the data volume and velocity will. Machine learning is a tool that can solve the problem of seismic explosion monitoring at scale. Using machine learning, and Long Short-term Memory (LSTM) models in particular, analysts can become more efficient by focusing their attention on signals of interest. From a global dataset of earthquake and explosion events, a model was trained to recognize the different classes of events, given their spectrograms. Optimal recurrent node count and training iterations were found, and cross validation was performed to evaluate model performance. A 10-fold mean accuracy of 96.92% was achieved on a balanced dataset of 30,002 instances. Given that the model is 446.52 MB it can be used to simultaneously characterize all incoming signals by researchers looking at events in isolation on desktop machines, as well as at scale on all of the nodes of a real-time streaming platform. LLNL-ABS-735911

  16. A Finite State Machine Approach to Algorithmic Lateral Inhibition for Real-Time Motion Detection †

    Directory of Open Access Journals (Sweden)

    María T. López

    2018-05-01

    Full Text Available Many researchers have explored the relationship between recurrent neural networks and finite state machines. Finite state machines constitute the best-characterized computational model, whereas artificial neural networks have become a very successful tool for modeling and problem solving. The neurally-inspired lateral inhibition method, and its application to motion detection tasks, have been successfully implemented in recent years. In this paper, control knowledge of the algorithmic lateral inhibition (ALI method is described and applied by means of finite state machines, in which the state space is constituted from the set of distinguishable cases of accumulated charge in a local memory. The article describes an ALI implementation for a motion detection task. For the implementation, we have chosen to use one of the members of the 16-nm Kintex UltraScale+ family of Xilinx FPGAs. FPGAs provide the necessary accuracy, resolution, and precision to run neural algorithms alongside current sensor technologies. The results offered in this paper demonstrate that this implementation provides accurate object tracking performance on several datasets, obtaining a high F-score value (0.86 for the most complex sequence used. Moreover, it outperforms implementations of a complete ALI algorithm and a simplified version of the ALI algorithm—named “accumulative computation”—which was run about ten years ago, now reaching real-time processing times that were simply not achievable at that time for ALI.

  17. Profiled support vector machines for antisense oligonucleotide efficacy prediction

    Directory of Open Access Journals (Sweden)

    Martín-Guerrero José D

    2004-09-01

    Full Text Available Abstract Background This paper presents the use of Support Vector Machines (SVMs for prediction and analysis of antisense oligonucleotide (AO efficacy. The collected database comprises 315 AO molecules including 68 features each, inducing a problem well-suited to SVMs. The task of feature selection is crucial given the presence of noisy or redundant features, and the well-known problem of the curse of dimensionality. We propose a two-stage strategy to develop an optimal model: (1 feature selection using correlation analysis, mutual information, and SVM-based recursive feature elimination (SVM-RFE, and (2 AO prediction using standard and profiled SVM formulations. A profiled SVM gives different weights to different parts of the training data to focus the training on the most important regions. Results In the first stage, the SVM-RFE technique was most efficient and robust in the presence of low number of samples and high input space dimension. This method yielded an optimal subset of 14 representative features, which were all related to energy and sequence motifs. The second stage evaluated the performance of the predictors (overall correlation coefficient between observed and predicted efficacy, r; mean error, ME; and root-mean-square-error, RMSE using 8-fold and minus-one-RNA cross-validation methods. The profiled SVM produced the best results (r = 0.44, ME = 0.022, and RMSE= 0.278 and predicted high (>75% inhibition of gene expression and low efficacy (http://aosvm.cgb.ki.se/. Conclusions The SVM approach is well suited to the AO prediction problem, and yields a prediction accuracy superior to previous methods. The profiled SVM was found to perform better than the standard SVM, suggesting that it could lead to improvements in other prediction problems as well.

  18. Automated Classification of Radiology Reports for Acute Lung Injury: Comparison of Keyword and Machine Learning Based Natural Language Processing Approaches.

    Science.gov (United States)

    Solti, Imre; Cooke, Colin R; Xia, Fei; Wurfel, Mark M

    2009-11-01

    This paper compares the performance of keyword and machine learning-based chest x-ray report classification for Acute Lung Injury (ALI). ALI mortality is approximately 30 percent. High mortality is, in part, a consequence of delayed manual chest x-ray classification. An automated system could reduce the time to recognize ALI and lead to reductions in mortality. For our study, 96 and 857 chest x-ray reports in two corpora were labeled by domain experts for ALI. We developed a keyword and a Maximum Entropy-based classification system. Word unigram and character n-grams provided the features for the machine learning system. The Maximum Entropy algorithm with character 6-gram achieved the highest performance (Recall=0.91, Precision=0.90 and F-measure=0.91) on the 857-report corpus. This study has shown that for the classification of ALI chest x-ray reports, the machine learning approach is superior to the keyword based system and achieves comparable results to highest performing physician annotators.

  19. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

    Science.gov (United States)

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  20. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT project.

    Directory of Open Access Journals (Sweden)

    Manal Alghamdi

    Full Text Available Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE. The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree and achieved high accuracy of prediction (AUC = 0.92. The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  1. An effective secondary decomposition approach for wind power forecasting using extreme learning machine trained by crisscross optimization

    International Nuclear Information System (INIS)

    Yin, Hao; Dong, Zhen; Chen, Yunlong; Ge, Jiafei; Lai, Loi Lei; Vaccaro, Alfredo; Meng, Anbo

    2017-01-01

    Highlights: • A secondary decomposition approach is applied in the data pre-processing. • The empirical mode decomposition is used to decompose the original time series. • IMF1 continues to be decomposed by applying wavelet packet decomposition. • Crisscross optimization algorithm is applied to train extreme learning machine. • The proposed SHD-CSO-ELM outperforms other pervious methods in the literature. - Abstract: Large-scale integration of wind energy into electric grid is restricted by its inherent intermittence and volatility. So the increased utilization of wind power necessitates its accurate prediction. The contribution of this study is to develop a new hybrid forecasting model for the short-term wind power prediction by using a secondary hybrid decomposition approach. In the data pre-processing phase, the empirical mode decomposition is used to decompose the original time series into several intrinsic mode functions (IMFs). A unique feature is that the generated IMF1 continues to be decomposed into appropriate and detailed components by applying wavelet packet decomposition. In the training phase, all the transformed sub-series are forecasted with extreme learning machine trained by our recently developed crisscross optimization algorithm (CSO). The final predicted values are obtained from aggregation. The results show that: (a) The performance of empirical mode decomposition can be significantly improved with its IMF1 decomposed by wavelet packet decomposition. (b) The CSO algorithm has satisfactory performance in addressing the premature convergence problem when applied to optimize extreme learning machine. (c) The proposed approach has great advantage over other previous hybrid models in terms of prediction accuracy.

  2. Two computational approaches for Monte Carlo based shutdown dose rate calculation with applications to the JET fusion machine

    Energy Technology Data Exchange (ETDEWEB)

    Petrizzi, L.; Batistoni, P.; Migliori, S. [Associazione EURATOM ENEA sulla Fusione, Frascati (Roma) (Italy); Chen, Y.; Fischer, U.; Pereslavtsev, P. [Association FZK-EURATOM Forschungszentrum Karlsruhe (Germany); Loughlin, M. [EURATOM/UKAEA Fusion Association, Culham Science Centre, Abingdon, Oxfordshire, OX (United Kingdom); Secco, A. [Nice Srl Via Serra 33 Camerano Casasco AT (Italy)

    2003-07-01

    In deuterium-deuterium (D-D) and deuterium-tritium (D-T) fusion plasmas neutrons are produced causing activation of JET machine components. For safe operation and maintenance it is important to be able to predict the induced activation and the resulting shut down dose rates. This requires a suitable system of codes which is capable of simulating both the neutron induced material activation during operation and the decay gamma radiation transport after shut-down in the proper 3-D geometry. Two methodologies to calculate the dose rate in fusion devices have been developed recently and applied to fusion machines, both using the MCNP Monte Carlo code. FZK has developed a more classical approach, the rigorous 2-step (R2S) system in which MCNP is coupled to the FISPACT inventory code with an automated routing. ENEA, in collaboration with the ITER Team, has developed an alternative approach, the direct 1 step method (D1S). Neutron and decay gamma transport are handled in one single MCNP run, using an ad hoc cross section library. The intention was to tightly couple the neutron induced production of a radio-isotope and the emission of its decay gammas for an accurate spatial distribution and a reliable calculated statistical error. The two methods have been used by the two Associations to calculate the dose rate in five positions of JET machine, two inside the vacuum chamber and three outside, at cooling times between 1 second and 1 year after shutdown. The same MCNP model and irradiation conditions have been assumed. The exercise has been proposed and financed in the frame of the Fusion Technological Program of the JET machine. The scope is to supply the designers with the most reliable tool and data to calculate the dose rate on fusion machines. Results showed that there is a good agreement: the differences range between 5-35%. The next step to be considered in 2003 will be an exercise in which the comparison will be done with dose-rate data from JET taken during and

  3. Two computational approaches for Monte Carlo based shutdown dose rate calculation with applications to the JET fusion machine

    International Nuclear Information System (INIS)

    Petrizzi, L.; Batistoni, P.; Migliori, S.; Chen, Y.; Fischer, U.; Pereslavtsev, P.; Loughlin, M.; Secco, A.

    2003-01-01

    In deuterium-deuterium (D-D) and deuterium-tritium (D-T) fusion plasmas neutrons are produced causing activation of JET machine components. For safe operation and maintenance it is important to be able to predict the induced activation and the resulting shut down dose rates. This requires a suitable system of codes which is capable of simulating both the neutron induced material activation during operation and the decay gamma radiation transport after shut-down in the proper 3-D geometry. Two methodologies to calculate the dose rate in fusion devices have been developed recently and applied to fusion machines, both using the MCNP Monte Carlo code. FZK has developed a more classical approach, the rigorous 2-step (R2S) system in which MCNP is coupled to the FISPACT inventory code with an automated routing. ENEA, in collaboration with the ITER Team, has developed an alternative approach, the direct 1 step method (D1S). Neutron and decay gamma transport are handled in one single MCNP run, using an ad hoc cross section library. The intention was to tightly couple the neutron induced production of a radio-isotope and the emission of its decay gammas for an accurate spatial distribution and a reliable calculated statistical error. The two methods have been used by the two Associations to calculate the dose rate in five positions of JET machine, two inside the vacuum chamber and three outside, at cooling times between 1 second and 1 year after shutdown. The same MCNP model and irradiation conditions have been assumed. The exercise has been proposed and financed in the frame of the Fusion Technological Program of the JET machine. The scope is to supply the designers with the most reliable tool and data to calculate the dose rate on fusion machines. Results showed that there is a good agreement: the differences range between 5-35%. The next step to be considered in 2003 will be an exercise in which the comparison will be done with dose-rate data from JET taken during and

  4. A least square support vector machine-based approach for contingency classification and ranking in a large power system

    Directory of Open Access Journals (Sweden)

    Bhanu Pratap Soni

    2016-12-01

    Full Text Available This paper proposes an effective supervised learning approach for static security assessment of a large power system. Supervised learning approach employs least square support vector machine (LS-SVM to rank the contingencies and predict the system severity level. The severity of the contingency is measured by two scalar performance indices (PIs: line MVA performance index (PIMVA and Voltage-reactive power performance index (PIVQ. SVM works in two steps. Step I is the estimation of both standard indices (PIMVA and PIVQ that is carried out under different operating scenarios and Step II contingency ranking is carried out based on the values of PIs. The effectiveness of the proposed methodology is demonstrated on IEEE 39-bus (New England system. The approach can be beneficial tool which is less time consuming and accurate security assessment and contingency analysis at energy management center.

  5. Icing Detection over East Asia from Geostationary Satellite Data Using Machine Learning Approaches

    Directory of Open Access Journals (Sweden)

    Seongmun Sim

    2018-04-01

    Full Text Available Even though deicing or airframe coating technologies continue to develop, aircraft icing is still one of the critical threats to aviation. While the detection of potential icing clouds has been conducted using geostationary satellite data in the US and Europe, there is not yet a robust model that detects potential icing areas in East Asia. In this study, we proposed machine-learning-based icing detection models using data from two geostationary satellites—the Communication, Ocean, and Meteorological Satellite (COMS Meteorological Imager (MI and the Himawari-8 Advanced Himawari Imager (AHI—over Northeast Asia. Two machine learning techniques—random forest (RF and multinomial log-linear (MLL models—were evaluated with quality-controlled pilot reports (PIREPs as the reference data. The machine-learning-based models were compared to the existing models through five-fold cross-validation. The RF model for COMS MI produced the best performance, resulting in a mean probability of detection (POD of 81.8%, a mean overall accuracy (OA of 82.1%, and mean true skill statistics (TSS of 64.0%. One of the existing models, flight icing threat (FIT, produced relatively poor performance, providing a mean POD of 36.4%, a mean OA of 61.0, and a mean TSS of 9.7%. The Himawari-8 based models also produced performance comparable to the COMS models. However, it should be noted that very limited PIREP reference data were available especially for the Himawari-8 models, which requires further evaluation in the future with more reference data. The spatio-temporal patterns of the icing areas detected using the developed models were also visually examined using time-series satellite data.

  6. Classification of suicide attempters in schizophrenia using sociocultural and clinical features: A machine learning approach.

    Science.gov (United States)

    Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo

    2017-07-01

    Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Impact of Health Care Employees’ Job Satisfaction on Organizational Performance Support Vector Machine Approach

    Directory of Open Access Journals (Sweden)

    CEMIL KUZEY

    2018-01-01

    Full Text Available This study is undertaken to search for key factors that contribute to job satisfaction among health care workers, and also to determine the impact of these underlying dimensions of employee satisfaction on organizational performance. Exploratory Factor Analysis (EFA is applied to initially uncover the key factors, and then, in the next stage of analysis, a popular data mining technique, Support Vector Machine (SVM is employed on a sample of 249 to determine the impact of job satisfaction factors on organizational performance. According to the proposed model, the main factors are revealed to be management’s attitude, pay/reward, job security and colleagues.

  8. An efficient approach for improving virtual machine placement in cloud computing environment

    Science.gov (United States)

    Ghobaei-Arani, Mostafa; Shamsi, Mahboubeh; Rahmanian, Ali A.

    2017-11-01

    The ever increasing demand for the cloud services requires more data centres. The power consumption in the data centres is a challenging problem for cloud computing, which has not been considered properly by the data centre developer companies. Especially, large data centres struggle with the power cost and the Greenhouse gases production. Hence, employing the power efficient mechanisms are necessary to optimise the mentioned effects. Moreover, virtual machine (VM) placement can be used as an effective method to reduce the power consumption in data centres. In this paper by grouping both virtual and physical machines, and taking into account the maximum absolute deviation during the VM placement, the power consumption as well as the service level agreement (SLA) deviation in data centres are reduced. To this end, the best-fit decreasing algorithm is utilised in the simulation to reduce the power consumption by about 5% compared to the modified best-fit decreasing algorithm, and at the same time, the SLA violation is improved by 6%. Finally, the learning automata are used to a trade-off between power consumption reduction from one side, and SLA violation percentage from the other side.

  9. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces

    Directory of Open Access Journals (Sweden)

    Rita Melo

    2016-07-01

    Full Text Available Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM, for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.

  10. Effect of reinforcement on the cutting forces while machining metal matrix composites–An experimental approach

    Directory of Open Access Journals (Sweden)

    Ch. Shoba

    2015-12-01

    Full Text Available Hybrid metal matrix composites are of great interest for researchers in recent years, because of their attractive superior properties over traditional materials and single reinforced composites. The machinabilty of hybrid composites becomes vital for manufacturing industries. The need to study the influence of process parameters on the cutting forces in turning such hybrid composite under dry environment is essentially required. In the present study, the influence of machining parameters, e.g. cutting speed, feed and depth of cut on the cutting force components, namely feed force (Ff, cutting force (Fc, and radial force (Fd has been investigated. Investigations were performed on 0, 2, 4, 6 and 8 wt% Silicon carbide (SiC and rice husk ash (RHA reinforced composite specimens. A comparison was made between the reinforced and unreinforced composites. The results proved that all the cutting force components decrease with the increase in the weight percentage of the reinforcement: this was probably due to the dislocation densities generated from the thermal mismatch between the reinforcement and the matrix. Experimental evidence also showed that built-up edge (BUE is formed during machining of low percentage reinforced composites at high speed and high depth of cut. The formation of BUE was captured by SEM, therefore confirming the result. The decrease of cutting force components with lower cutting speed and higher feed and depth of cut was also highlighted. The related mechanisms are explained and presented.

  11. On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Taimur Bakhshi

    2016-01-01

    Full Text Available Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification.

  12. Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach.

    Directory of Open Access Journals (Sweden)

    Dhanya Menoth Mohan

    Full Text Available The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs. Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants' explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs, and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes.

  13. Effect of Subliminal Lexical Priming on the Subjective Perception of Images: A Machine Learning Approach.

    Science.gov (United States)

    Mohan, Dhanya Menoth; Kumar, Parmod; Mahmood, Faisal; Wong, Kian Foong; Agrawal, Abhishek; Elgendi, Mohamed; Shukla, Rohit; Ang, Natania; Ching, April; Dauwels, Justin; Chan, Alice H D

    2016-01-01

    The purpose of the study is to examine the effect of subliminal priming in terms of the perception of images influenced by words with positive, negative, and neutral emotional content, through electroencephalograms (EEGs). Participants were instructed to rate how much they like the stimuli images, on a 7-point Likert scale, after being subliminally exposed to masked lexical prime words that exhibit positive, negative, and neutral connotations with respect to the images. Simultaneously, the EEGs were recorded. Statistical tests such as repeated measures ANOVAs and two-tailed paired-samples t-tests were performed to measure significant differences in the likability ratings among the three prime affect types; the results showed a strong shift in the likeness judgment for the images in the positively primed condition compared to the other two. The acquired EEGs were examined to assess the difference in brain activity associated with the three different conditions. The consistent results obtained confirmed the overall priming effect on participants' explicit ratings. In addition, machine learning algorithms such as support vector machines (SVMs), and AdaBoost classifiers were applied to infer the prime affect type from the ERPs. The highest classification rates of 95.0% and 70.0% obtained respectively for average-trial binary classifier and average-trial multi-class further emphasize that the ERPs encode information about the different kinds of primes.

  14. Field tests and machine learning approaches for refining algorithms and correlations of driver's model parameters.

    Science.gov (United States)

    Tango, Fabio; Minin, Luca; Tesauri, Francesco; Montanari, Roberto

    2010-03-01

    This paper describes the field tests on a driving simulator carried out to validate the algorithms and the correlations of dynamic parameters, specifically driving task demand and drivers' distraction, able to predict drivers' intentions. These parameters belong to the driver's model developed by AIDE (Adaptive Integrated Driver-vehicle InterfacE) European Integrated Project. Drivers' behavioural data have been collected from the simulator tests to model and validate these parameters using machine learning techniques, specifically the adaptive neuro fuzzy inference systems (ANFIS) and the artificial neural network (ANN). Two models of task demand and distraction have been developed, one for each adopted technique. The paper provides an overview of the driver's model, the description of the task demand and distraction modelling and the tests conducted for the validation of these parameters. A test comparing predicted and expected outcomes of the modelled parameters for each machine learning technique has been carried out: for distraction, in particular, promising results (low prediction errors) have been obtained by adopting an artificial neural network.

  15. Investigation of Effect of Machine Layout on Productivity and Utilization Level: What If Simulation Approach

    Directory of Open Access Journals (Sweden)

    Islam Faisal Bourini

    2018-03-01

    Full Text Available Designing and selecting the material handling system is a vital factor for any production line, and as result for the whole manufacturing system. Poor design and unsuitable handling equipment may increase the risk of having bottlenecks, longer production time and as a result the higher total production cost. One of the useful and effective tools are using “what if” simulation techniques. However, this technique needs effective simulation software. The main objective for this research is to simulate different types of handling system using what if scenario. To achieve the objective of the research, Delmia Quest software has been used to simulate two different systems: manual system and conveyers system for the same production line and analyses the differences in terms of utilization and production rate. The results obtained have been analysed and appraised to induce the bottleneck locations, productivity and utilizations of the machines and material handling systems used in the design system. Finally, the best model have been developed to increase the productivity, utilizations of the machines, material handling systems and to minimize the bottleneck locations.

  16. Using Blood Indexes to Predict Overweight Statuses: An Extreme Learning Machine-Based Approach.

    Directory of Open Access Journals (Sweden)

    Huiling Chen

    Full Text Available The number of the overweight people continues to rise across the world. Studies have shown that being overweight can increase health risks, such as high blood pressure, diabetes mellitus, coronary heart disease, and certain forms of cancer. Therefore, identifying the overweight status in people is critical to prevent and decrease health risks. This study explores a new technique that uses blood and biochemical measurements to recognize the overweight condition. A new machine learning technique, an extreme learning machine, was developed to accurately detect the overweight status from a pool of 225 overweight and 251 healthy subjects. The group included 179 males and 297 females. The detection method was rigorously evaluated against the real-life dataset for accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic (ROC curve criterion. Additionally, the feature selection was investigated to identify correlating factors for the overweight status. The results demonstrate that there are significant differences in blood and biochemical indexes between healthy and overweight people (p-value < 0.01. According to the feature selection, the most important correlated indexes are creatinine, hemoglobin, hematokrit, uric Acid, red blood cells, high density lipoprotein, alanine transaminase, triglyceride, and γ-glutamyl transpeptidase. These are consistent with the results of Spearman test analysis. The proposed method holds promise as a new, accurate method for identifying the overweight status in subjects.

  17. Use of Machine-Learning Approaches to Predict Clinical Deterioration in Critically Ill Patients: A Systematic Review

    Directory of Open Access Journals (Sweden)

    Tadashi Kamio

    2017-06-01

    Full Text Available Introduction: Early identification of patients with unexpected clinical deterioration is a matter of serious concern. Previous studies have shown that early intervention on a patient whose health is deteriorating improves the patient outcome, and machine-learning-based approaches to predict clinical deterioration may contribute to precision improvement. To date, however, no systematic review in this area is available. Methods: We completed a search on PubMed on January 22, 2017 as well as a review of the articles identified by study authors involved in this area of research following the preferred reporting items for systematic reviews and meta-analyses (PRISMA guidelines for systematic reviews. Results: Twelve articles were selected for the current study from 273 articles initially obtained from the PubMed searches. Eleven of the 12 studies were retrospective studies, and no randomized controlled trials were performed. Although the artificial neural network techniques were the most frequently used and provided high precision and accuracy, we failed to identify articles that showed improvement in the patient outcome. Limitations were reported related to generalizability, complexity of models, and technical knowledge. Conclusions: This review shows that machine-learning approaches can improve prediction of clinical deterioration compared with traditional methods. However, these techniques will require further external validation before widespread clinical acceptance can be achieved.

  18. A Fast SVM-Based Tongue’s Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis

    Directory of Open Access Journals (Sweden)

    Nur Diyana Kamarudin

    2017-01-01

    Full Text Available In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye’s ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue’s multicolour classification based on a support vector machine (SVM whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black, deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.

  19. A Fast SVM-Based Tongue's Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis

    Science.gov (United States)

    Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori

    2017-01-01

    In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k-means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k-means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds. PMID:29065640

  20. A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks.

    Science.gov (United States)

    Mei, Suyu; Zhu, Hao

    2015-01-26

    Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.

  1. A Fast SVM-Based Tongue's Colour Classification Aided by k-Means Clustering Identifiers and Colour Attributes as Computer-Assisted Tool for Tongue Diagnosis.

    Science.gov (United States)

    Kamarudin, Nur Diyana; Ooi, Chia Yee; Kawanabe, Tadaaki; Odaguchi, Hiroshi; Kobayashi, Fuminori

    2017-01-01

    In tongue diagnosis, colour information of tongue body has kept valuable information regarding the state of disease and its correlation with the internal organs. Qualitatively, practitioners may have difficulty in their judgement due to the instable lighting condition and naked eye's ability to capture the exact colour distribution on the tongue especially the tongue with multicolour substance. To overcome this ambiguity, this paper presents a two-stage tongue's multicolour classification based on a support vector machine (SVM) whose support vectors are reduced by our proposed k -means clustering identifiers and red colour range for precise tongue colour diagnosis. In the first stage, k -means clustering is used to cluster a tongue image into four clusters of image background (black), deep red region, red/light red region, and transitional region. In the second-stage classification, red/light red tongue images are further classified into red tongue or light red tongue based on the red colour range derived in our work. Overall, true rate classification accuracy of the proposed two-stage classification to diagnose red, light red, and deep red tongue colours is 94%. The number of support vectors in SVM is improved by 41.2%, and the execution time for one image is recorded as 48 seconds.

  2. Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.

    Science.gov (United States)

    Yu, Tianwei; Jones, Dean P

    2014-10-15

    Peak detection is a key step in the preprocessing of untargeted metabolomics data generated from high-resolution liquid chromatography-mass spectrometry (LC/MS). The common practice is to use filters with predetermined parameters to select peaks in the LC/MS profile. This rigid approach can cause suboptimal performance when the choice of peak model and parameters do not suit the data characteristics. Here we present a method that learns directly from various data features of the extracted ion chromatograms (EICs) to differentiate between true peak regions from noise regions in the LC/MS profile. It utilizes the knowledge of known metabolites, as well as robust machine learning approaches. Unlike currently available methods, this new approach does not assume a parametric peak shape model and allows maximum flexibility. We demonstrate the superiority of the new approach using real data. Because matching to known metabolites entails uncertainties and cannot be considered a gold standard, we also developed a probabilistic receiver-operating characteristic (pROC) approach that can incorporate uncertainties. The new peak detection approach is implemented as part of the apLCMS package available at http://web1.sph.emory.edu/apLCMS/ CONTACT: tyu8@emory.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Performance Optimization of Electrical Discharge Machining (Die Sinker for Al-6061 via Taguchi Approach

    Directory of Open Access Journals (Sweden)

    Muhammad Qaiser Saleem

    2015-04-01

    Full Text Available This paper parametrically optimizes the EDM (Electrical Discharge Machining process in die sinking mode for material removal rate, surface roughness and edge quality of aluminum alloy Al-6061. The effect of eight parameters namely discharge current, pulse on-time, pulse off-time, auxiliary current, working time, jump time distance, servo speed and work piece hardness are investigated. Taguchi's orthogonal array L18 is employed herein for experimentation. ANOVA (Analysis of Variance with F-ratio criterion at 95% confidence level is used for identification of significant parameters whereas SNR (Signal to Noise Ratio is used for determination of optimum levels. Optimization obtained for Al-6061 with parametric combination investigated herein is validated by the confirmation run.

  4. ON THE APPLICATION OF PARTIAL BARRIERS FOR SPINNING MACHINE NOISE CONTROL: A THEORETICAL AND EXPERIMENTAL APPROACH

    Directory of Open Access Journals (Sweden)

    M. R. Monazzam, A. Nezafat

    2007-04-01

    Full Text Available Noise is one of the most serious challenges in modern community. In some specific industries, according to the nature of process, this challenge is more threatening. This paper describes a means of noise control for spinning machine based on experimental measurements. Also advantages and disadvantages of the control procedure are added. Different factors which may affect the performance of the barrier in this situation are also mentioned. To provide a good estimation of the control measure, a theoretical formula is also described and it is compared with the field data. Good agreement between the results of filed measurements and theoretical presented model was achieved. No obvious noise reduction was seen by partial indoor barriers in low absorbent enclosed spaces, since the reflection from multiple hard surfaces is the main dominated factor in the tested environment. At the end, the situation of the environment and standards, which are necessary in attaining the ideal results, are explained.

  5. Hemodynamic modelling of BOLD fMRI - A machine learning approach

    DEFF Research Database (Denmark)

    Jacobsen, Danjal Jakup

    2007-01-01

    This Ph.D. thesis concerns the application of machine learning methods to hemodynamic models for BOLD fMRI data. Several such models have been proposed by different researchers, and they have in common a basis in physiological knowledge of the hemodynamic processes involved in the generation...... of the BOLD signal. The BOLD signal is modelled as a non-linear function of underlying, hidden (non-measurable) hemodynamic state variables. The focus of this thesis work has been to develop methods for learning the parameters of such models, both in their traditional formulation, and in a state space...... formulation. In the latter, noise enters at the level of the hidden states, as well as in the BOLD measurements themselves. A framework has been developed to allow approximate posterior distributions of model parameters to be learned from real fMRI data. This is accomplished with Markov chain Monte Carlo...

  6. A machine learning approach to triaging patients with chronic obstructive pulmonary disease.

    Directory of Open Access Journals (Sweden)

    Sumanth Swaminathan

    Full Text Available COPD patients are burdened with a daily risk of acute exacerbation and loss of control, which could be mitigated by effective, on-demand decision support tools. In this study, we present a machine learning-based strategy for early detection of exacerbations and subsequent triage. Our application uses physician opinion in a statistically and clinically comprehensive set of patient cases to train a supervised prediction algorithm. The accuracy of the model is assessed against a panel of physicians each triaging identical cases in a representative patient validation set. Our results show that algorithm accuracy and safety indicators surpass all individual pulmonologists in both identifying exacerbations and predicting the consensus triage in a 101 case validation set. The algorithm is also the top performer in sensitivity, specificity, and ppv when predicting a patient's need for emergency care.

  7. How wrong can we get? A review of machine learning approaches and error bars.

    Science.gov (United States)

    Schwaighofer, Anton; Schroeter, Timon; Mika, Sebastian; Blanchard, Gilles

    2009-06-01

    A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.

  8. Mechatronics Approach for a Controlled Actuation of the Presser Foot Mechanism on an Industrial Sewing Machine

    Directory of Open Access Journals (Sweden)

    L. F. Silva

    2000-01-01

    Full Text Available This paper describes the study of the research program being carried out on the feeding system of an industrial overlock sewing machine. The results obtained from the presser foot bar displacement and compression force, together with the graphic kinematic analysis, which includes the velocity and acceleration taken from the displacement-time curves of the presser bar, led to further understanding of the feeding system dynamics. This study is providing the basis for the development of a redesigned and optimized fabric feeding system. The new actuation system, based on a proportional force solenoid integrated in the presser foot bar, will be also discussed as an important contribution to achieving a desired dynamic behaviour at high sewing speeds.

  9. A machine learning approach to triaging patients with chronic obstructive pulmonary disease

    Science.gov (United States)

    Qirko, Klajdi; Smith, Ted; Corcoran, Ethan; Wysham, Nicholas G.; Bazaz, Gaurav; Kappel, George; Gerber, Anthony N.

    2017-01-01

    COPD patients are burdened with a daily risk of acute exacerbation and loss of control, which could be mitigated by effective, on-demand decision support tools. In this study, we present a machine learning-based strategy for early detection of exacerbations and subsequent triage. Our application uses physician opinion in a statistically and clinically comprehensive set of patient cases to train a supervised prediction algorithm. The accuracy of the model is assessed against a panel of physicians each triaging identical cases in a representative patient validation set. Our results show that algorithm accuracy and safety indicators surpass all individual pulmonologists in both identifying exacerbations and predicting the consensus triage in a 101 case validation set. The algorithm is also the top performer in sensitivity, specificity, and ppv when predicting a patient’s need for emergency care. PMID:29166411

  10. Predicting the Geothermal Heat Flux in Greenland: A Machine Learning Approach

    Science.gov (United States)

    Rezvanbehbahani, Soroush; Stearns, Leigh A.; Kadivar, Amir; Walker, J. Doug; van der Veen, C. J.

    2017-12-01

    Geothermal heat flux (GHF) is a crucial boundary condition for making accurate predictions of ice sheet mass loss, yet it is poorly known in Greenland due to inaccessibility of the bedrock. Here we use a machine learning algorithm on a large collection of relevant geologic features and global GHF measurements and produce a GHF map of Greenland that we argue is within ˜15% accuracy. The main features of our predicted GHF map include a large region with high GHF in central-north Greenland surrounding the NorthGRIP ice core site, and hot spots in the Jakobshavn Isbræ catchment, upstream of Petermann Gletscher, and near the terminus of Nioghalvfjerdsfjorden glacier. Our model also captures the trajectory of Greenland movement over the Icelandic plume by predicting a stripe of elevated GHF in central-east Greenland. Finally, we show that our model can produce substantially more accurate predictions if additional measurements of GHF in Greenland are provided.

  11. Modular Approach to Designing Computer Cultural Systems: Culture as a Thermodynamic Machine

    Directory of Open Access Journals (Sweden)

    Leland Gilsen

    2015-01-01

    Full Text Available Culture is a complex non-linear system. In order to design computer simulations of cultural systems, it is necessary to break the system down into sub-systems. Human culture is modular. It consists of sets of people that belong to economic units. Access to, and control over matter, energy and information is postulated as the key to development of cultural simulations. Because resources in the real world are patchy, access to and control over resources is expressed in two related arenas: economics (direct control and politics (non-direct control. The best way to create models for cultural ecology/economics lies in an energy-information-economic paradigm based on general systems theory and an understanding of the "thermodynamics" of ecology, or culture as a thermodynamic machine.

  12. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    Science.gov (United States)

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  13. Beyond Where to How: A Machine Learning Approach for Sensing Mobility Contexts Using Smartphone Sensors

    Directory of Open Access Journals (Sweden)

    Robert E. Guinness

    2015-04-01

    Full Text Available This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers, geospatial information (points of interest, such as bus stops and train stations and machine learning (ML to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user’s mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (<5 s. We investigated a wide range of supervised learning techniques for classification, including decision trees (DT, support vector machines (SVM, naive Bayes classifiers (NB, Bayesian networks (BN, logistic regression (LR, artificial neural networks (ANN and several instance-based classifiers (KStar, LWLand IBk. Applying ten-fold cross-validation, the best performers in terms of correct classification rate (i.e., recall were DT (96.5%, BN (90.9%, LWL (95.5% and KStar (95.6%. In particular, the DT-algorithm RandomForest exhibited the best overall performance. After a feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB required about five-times the amount of CPU time as the fastest classifier (SVM. The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in

  14. Big Data Analysis for Personalized Health Activities: Machine Learning Processing for Automatic Keyword Extraction Approach

    Directory of Open Access Journals (Sweden)

    Jun-Ho Huh

    2018-04-01

    Full Text Available The obese population is increasing rapidly due to the change of lifestyle and diet habits. Obesity can cause various complications and is becoming a social disease. Nonetheless, many obese patients are unaware of the medical treatments that are right for them. Although a variety of online and offline obesity management services have been introduced, they are still not enough to attract the attention of users and are not much of help to solve the problem. Obesity healthcare and personalized health activities are the important factors. Since obesity is related to lifestyle habits, eating habits, and interests, I concluded that the big data analysis of these factors could deduce the problem. Therefore, I collected big data by applying the machine learning and crawling method to the unstructured citizen health data in Korea and the search data of Naver, which is a Korean portal company, and Google for keyword analysis for personalized health activities. It visualized the big data using text mining and word cloud. This study collected and analyzed the data concerning the interests related to obesity, change of interest on obesity, and treatment articles. The analysis showed a wide range of seasonal factors according to spring, summer, fall, and winter. It also visualized and completed the process of extracting the keywords appropriate for treatment of abdominal obesity and lower body obesity. The keyword big data analysis technique for personalized health activities proposed in this paper is based on individual’s interests, level of interest, and body type. Also, the user interface (UI that visualizes the big data compatible with Android and Apple iOS. The users can see the data on the app screen. Many graphs and pictures can be seen via menu, and the significant data values are visualized through machine learning. Therefore, I expect that the big data analysis using various keywords specific to a person will result in measures for personalized

  15. Quantifying Risk for Anxiety Disorders in Preschool Children: A Machine Learning Approach.

    Science.gov (United States)

    Carpenter, Kimberly L H; Sprechmann, Pablo; Calderbank, Robert; Sapiro, Guillermo; Egger, Helen L

    2016-01-01

    Early childhood anxiety disorders are common, impairing, and predictive of anxiety and mood disorders later in childhood. Epidemiological studies over the last decade find that the prevalence of impairing anxiety disorders in preschool children ranges from 0.3% to 6.5%. Yet, less than 15% of young children with an impairing anxiety disorder receive a mental health evaluation or treatment. One possible reason for the low rate of care for anxious preschoolers is the lack of affordable, timely, reliable and valid tools for identifying young children with clinically significant anxiety. Diagnostic interviews assessing psychopathology in young children require intensive training, take hours to administer and code, and are not available for use outside of research settings. The Preschool Age Psychiatric Assessment (PAPA) is a reliable and valid structured diagnostic parent-report interview for assessing psychopathology, including anxiety disorders, in 2 to 5 year old children. In this paper, we apply machine-learning tools to already collected PAPA data from two large community studies to identify sub-sets of PAPA items that could be developed into an efficient, reliable, and valid screening tool to assess a young child's risk for an anxiety disorder. Using machine learning, we were able to decrease by an order of magnitude the number of items needed to identify a child who is at risk for an anxiety disorder with an accuracy of over 96% for both generalized anxiety disorder (GAD) and separation anxiety disorder (SAD). Additionally, rather than considering GAD or SAD as discrete/binary entities, we present a continuous risk score representing the child's risk of meeting criteria for GAD or SAD. Identification of a short question-set that assesses risk for an anxiety disorder could be a first step toward development and validation of a relatively short screening tool feasible for use in pediatric clinics and daycare/preschool settings.

  16. GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.

    Science.gov (United States)

    Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph

    2018-04-01

    Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.

  17. Kernel based machine learning algorithm for the efficient prediction of type III polyketide synthase family of proteins

    Directory of Open Access Journals (Sweden)

    Mallika V

    2010-03-01

    Full Text Available Type III Polyketide synthases (PKS are family of proteins considered to have significant role in the biosynthesis of various polyketides in plants, fungi and bacteria. As these proteins show positive effects to human health, more researches are going on regarding this particular protein. Developing a tool to identify the probability of sequence, being a type III polyketide synthase will minimize the time consumption and manpower efforts. In this approach, we have designed and implemented PKSIIIpred, a high performance prediction server for type III PKS where the classifier is Support Vector Machine (SVM. Based on the limited training dataset, the tool efficiently predicts the type III PKS superfamily of proteins with high sensitivity and specificity. PKSIIIpred is available at http://type3pks.in/prediction/. We expect that this tool may serve as a useful resource for type III PKS researchers. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.

  18. Real-time PCR Machine System Modeling and a Systematic Approach for the Robust Design of a Real-time PCR-on-a-Chip System

    OpenAIRE

    Lee, Da-Sheng

    2010-01-01

    Chip-based DNA quantification systems are widespread, and used in many point-of-care applications. However, instruments for such applications may not be maintained or calibrated regularly. Since machine reliability is a key issue for normal operation, this study presents a system model of the real-time Polymerase Chain Reaction (PCR) machine to analyze the instrument design through numerical experiments. Based on model analysis, a systematic approach was developed to lower the variation of DN...

  19. An assessment of machine and statistical learning approaches to inferring networks of protein-protein interactions

    Directory of Open Access Journals (Sweden)

    Browne Fiona

    2006-12-01

    Full Text Available Protein-protein interactions (PPI play a key role in many biological systems. Over the past few years, an explosion in availability of functional biological data obtained from high-throughput technologies to infer PPI has been observed. However, results obtained from such experiments show high rates of false positives and false negatives predictions as well as systematic predictive bias. Recent research has revealed that several machine and statistical learning methods applied to integrate relatively weak, diverse sources of large-scale functional data may provide improved predictive accuracy and coverage of PPI. In this paper we describe the effects of applying different computational, integrative methods to predict PPI in Saccharomyces cerevisiae. We investigated the predictive ability of combining different sets of relatively strong and weak predictive datasets. We analysed several genomic datasets ranging from mRNA co-expression to marginal essentiality. Moreover, we expanded an existing multi-source dataset from S. cerevisiae by constructing a new set of putative interactions extracted from Gene Ontology (GO- driven annotations in the Saccharomyces Genome Database. Different classification techniques: Simple Naive Bayesian (SNB, Multilayer Perceptron (MLP and K-Nearest Neighbors (KNN were evaluated. Relatively simple classification methods (i.e. less computing intensive and mathematically complex, such as SNB, have been proven to be proficient at predicting PPI. SNB produced the “highest” predictive quality obtaining an area under Receiver Operating Characteristic (ROC curve (AUC value of 0.99. The lowest AUC value of 0.90 was obtained by the KNN classifier. This assessment also demonstrates the strong predictive power of GO-driven models, which offered predictive performance above 0.90 using the different machine learning and statistical techniques. As the predictive power of single-source datasets became weaker MLP and SNB performed

  20. Landscape epidemiology and machine learning: A geospatial approach to modeling West Nile virus risk in the United States

    Science.gov (United States)

    Young, Sean Gregory

    The complex interactions between human health and the physical landscape and environment have been recognized, if not fully understood, since the ancient Greeks. Landscape epidemiology, sometimes called spatial epidemiology, is a sub-discipline of medical geography that uses environmental conditions as explanatory variables in the study of disease or other health phenomena. This theory suggests that pathogenic organisms (whether germs or larger vector and host species) are subject to environmental conditions that can be observed on the landscape, and by identifying where such organisms are likely to exist, areas at greatest risk of the disease can be derived. Machine learning is a sub-discipline of artificial intelligence that can be used to create predictive models from large and complex datasets. West Nile virus (WNV) is a relatively new infectious disease in the United States, and has a fairly well-understood transmission cycle that is believed to be highly dependent on environmental conditions. This study takes a geospatial approach to the study of WNV risk, using both landscape epidemiology and machine learning techniques. A combination of remotely sensed and in situ variables are used to predict WNV incidence with a correlation coefficient as high as 0.86. A novel method of mitigating the small numbers problem is also tested and ultimately discarded. Finally a consistent spatial pattern of model errors is identified, indicating the chosen variables are capable of predicting WNV disease risk across most of the United States, but are inadequate in the northern Great Plains region of the US.

  1. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    Science.gov (United States)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  2. A Cross-Entropy-Based Admission Control Optimization Approach for Heterogeneous Virtual Machine Placement in Public Clouds

    Directory of Open Access Journals (Sweden)

    Li Pan

    2016-03-01

    Full Text Available Virtualization technologies make it possible for cloud providers to consolidate multiple IaaS provisions into a single server in the form of virtual machines (VMs. Additionally, in order to fulfill the divergent service requirements from multiple users, a cloud provider needs to offer several types of VM instances, which are associated with varying configurations and performance, as well as different prices. In such a heterogeneous virtual machine placement process, one significant problem faced by a cloud provider is how to optimally accept and place multiple VM service requests into its cloud data centers to achieve revenue maximization. To address this issue, in this paper, we first formulate such a revenue maximization problem during VM admission control as a multiple-dimensional knapsack problem, which is known to be NP-hard to solve. Then, we propose to use a cross-entropy-based optimization approach to address this revenue maximization problem, by obtaining a near-optimal eligible set for the provider to accept into its data centers, from the waiting VM service requests in the system. Finally, through extensive experiments and measurements in a simulated environment with the settings of VM instance classes derived from real-world cloud systems, we show that our proposed cross-entropy-based admission control optimization algorithm is efficient and effective in maximizing cloud providers’ revenue in a public cloud computing environment.

  3. A Comparative Experimental Study on the Use of Machine Learning Approaches for Automated Valve Monitoring Based on Acoustic Emission Parameters

    Science.gov (United States)

    Ali, Salah M.; Hui, K. H.; Hee, L. M.; Salman Leong, M.; Al-Obaidi, M. A.; Ali, Y. H.; Abdelrhman, Ahmed M.

    2018-03-01

    Acoustic emission (AE) analysis has become a vital tool for initiating the maintenance tasks in many industries. However, the analysis process and interpretation has been found to be highly dependent on the experts. Therefore, an automated monitoring method would be required to reduce the cost and time consumed in the interpretation of AE signal. This paper investigates the application of two of the most common machine learning approaches namely artificial neural network (ANN) and support vector machine (SVM) to automate the diagnosis of valve faults in reciprocating compressor based on AE signal parameters. Since the accuracy is an essential factor in any automated diagnostic system, this paper also provides a comparative study based on predictive performance of ANN and SVM. AE parameters data was acquired from single stage reciprocating air compressor with different operational and valve conditions. ANN and SVM diagnosis models were subsequently devised by combining AE parameters of different conditions. Results demonstrate that ANN and SVM models have the same results in term of prediction accuracy. However, SVM model is recommended to automate diagnose the valve condition in due to the ability of handling a high number of input features with low sampling data sets.

  4. A novel approach for detection and classification of mammographic microcalcifications using wavelet analysis and extreme learning machine.

    Science.gov (United States)

    Malar, E; Kandaswamy, A; Chakravarthy, D; Giri Dharan, A

    2012-09-01

    The objective of this paper is to reveal the effectiveness of wavelet based tissue texture analysis for microcalcification detection in digitized mammograms using Extreme Learning Machine (ELM). Microcalcifications are tiny deposits of calcium in the breast tissue which are potential indicators for early detection of breast cancer. The dense nature of the breast tissue and the poor contrast of the mammogram image prohibit the effectiveness in identifying microcalcifications. Hence, a new approach to discriminate the microcalcifications from the normal tissue is done using wavelet features and is compared with different feature vectors extracted using Gray Level Spatial Dependence Matrix (GLSDM) and Gabor filter based techniques. A total of 120 Region of Interests (ROIs) extracted from 55 mammogram images of mini-Mias database, including normal and microcalcification images are used in the current research. The network is trained with the above mentioned features and the results denote that ELM produces relatively better classification accuracy (94%) with a significant reduction in training time than the other artificial neural networks like Bayesnet classifier, Naivebayes classifier, and Support Vector Machine. ELM also avoids problems like local minima, improper learning rate, and over fitting. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Development of a state machine sequencer for the Keck Interferometer: evolution, development, and lessons learned using a CASE tool approach

    Science.gov (United States)

    Reder, Leonard J.; Booth, Andrew; Hsieh, Jonathan; Summers, Kellee R.

    2004-09-01

    This paper presents a discussion of the evolution of a sequencer from a simple Experimental Physics and Industrial Control System (EPICS) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a Computer Aided Software Engineering (CASE) tool approach. The main purpose of the Interferometer Sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations to be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Harel finite state machine software program designed to orchestrate several lower-level hardware and software hard real-time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORBA, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.

  6. Modified automatic teller machine prototype for older adults: a case study of participative approach to inclusive design.

    Science.gov (United States)

    Chan, Chetwyn C H; Wong, Alex W K; Lee, Tatia M C; Chi, Iris

    2009-03-01

    The goal of this study was to enhance an existing automated teller machine (ATM) human-machine interface in order to accommodate the needs of older adults. Older adults were involved in the design and field test of the modified ATM prototype. The design of the user interface and functionality took the cognitive and physical abilities of older adults into account. The modified ATM system included only "cash withdrawal" and "transfer" functions based on the task demands and needs for services of older adults. One hundred and forty-one older adults (aged 60 or above) participated in the field test by operating modified or existing ATM systems. Those who operated the modified system were found to have significantly higher success rates than those who operated the existing system. The enhancement was most significant among older adults who had lower ATM-related abilities, a lower level of education, and no prior experience of using ATMs. This study demonstrates the usefulness of using a universal design and participatory approach to modify the existing ATM system for use by older adults. However, it also leads to a reduction in functionality of the enhanced system. Future studies should explore ways to develop a universal design ATM system which can satisfy the abilities and needs of all users in the entire population.

  7. An Object-Based Classification of Mangroves Using a Hybrid Decision Tree—Support Vector Machine Approach

    Directory of Open Access Journals (Sweden)

    Benjamin W. Heumann

    2011-11-01

    Full Text Available Mangroves provide valuable ecosystem goods and services such as carbon sequestration, habitat for terrestrial and marine fauna, and coastal hazard mitigation. The use of satellite remote sensing to map mangroves has become widespread as it can provide accurate, efficient, and repeatable assessments. Traditional remote sensing approaches have failed to accurately map fringe mangroves and true mangrove species due to relatively coarse spatial resolution and/or spectral confusion with landward vegetation. This study demonstrates the use of the new Worldview-2 sensor, Object-based image analysis (OBIA, and support vector machine (SVM classification to overcome both of these limitations. An exploratory spectral separability showed that individual mangrove species could not be spectrally separated, but a distinction between true and associate mangrove species could be made. An OBIA classification was used that combined a decision-tree classification with the machine-learning SVM classification. Results showed an overall accuracy greater than 94% (kappa = 0.863 for classifying true mangroves species and other dense coastal vegetation at the object level. There remain serious challenges to accurately mapping fringe mangroves using remote sensing data due to spectral similarity of mangrove and associate species, lack of clear zonation between species, and mixed pixel effects, especially when vegetation is sparse or degraded.

  8. Soft brain-machine interfaces for assistive robotics: A novel control approach.

    Science.gov (United States)

    Schiatti, Lucia; Tessadori, Jacopo; Barresi, Giacinto; Mattos, Leonardo S; Ajoudani, Arash

    2017-07-01

    Robotic systems offer the possibility of improving the life quality of people with severe motor disabilities, enhancing the individual's degree of independence and interaction with the external environment. In this direction, the operator's residual functions must be exploited for the control of the robot movements and the underlying dynamic interaction through intuitive and effective human-robot interfaces. Towards this end, this work aims at exploring the potential of a novel Soft Brain-Machine Interface (BMI), suitable for dynamic execution of remote manipulation tasks for a wide range of patients. The interface is composed of an eye-tracking system, for an intuitive and reliable control of a robotic arm system's trajectories, and a Brain-Computer Interface (BCI) unit, for the control of the robot Cartesian stiffness, which determines the interaction forces between the robot and environment. The latter control is achieved by estimating in real-time a unidimensional index from user's electroencephalographic (EEG) signals, which provides the probability of a neutral or active state. This estimated state is then translated into a stiffness value for the robotic arm, allowing a reliable modulation of the robot's impedance. A preliminary evaluation of this hybrid interface concept provided evidence on the effective execution of tasks with dynamic uncertainties, demonstrating the great potential of this control method in BMI applications for self-service and clinical care.

  9. Ecophysiological modeling of grapevine water stress in Burgundy terroirs by a machine-learning approach

    Directory of Open Access Journals (Sweden)

    Luca eBrillante

    2016-06-01

    Full Text Available In a climate change scenario, successful modeling of the relationships between plant-soil-meteorology is crucial for a sustainable agricultural production, especially for perennial crops. Grapevines (Vitis vinifera L. cv Chardonnay located in eight experimental plots (Burgundy, France along a hillslope were monitored weekly for three years for leaf water potentials, both at predawn (Ψpd and at midday (Ψstem. The water stress experienced by grapevine was modeled as a function of meteorological data (minimum and maximum temperature, rainfall and soil characteristics (soil texture, gravel content, slope by a gradient boosting machine. Model performance was assessed by comparison with carbon isotope discrimination (δ13C of grape sugars at harvest and by the use of a test-set. The developed models reached outstanding prediction performance (RMSE < 0.08 MPa for Ψstem and < 0.06 MPa for Ψpd, comparable to measurement accuracy. Model predictions at a daily time step improved correlation with δ13C data, respect to the observed trend at a weekly time scale. The role of each predictor in these models was described in order to understand how temperature, rainfall, soil texture, gravel content and slope affect the grapevine water status in the studied context. This work proposes a straight-forward strategy to simulate plant water stress in field condition, at a local scale; to investigate ecological relationships in the vineyard and adapt cultural practices to future conditions.

  10. Ecophysiological Modeling of Grapevine Water Stress in Burgundy Terroirs by a Machine-Learning Approach.

    Science.gov (United States)

    Brillante, Luca; Mathieu, Olivier; Lévêque, Jean; Bois, Benjamin

    2016-01-01

    In a climate change scenario, successful modeling of the relationships between plant-soil-meteorology is crucial for a sustainable agricultural production, especially for perennial crops. Grapevines (Vitis vinifera L. cv Chardonnay) located in eight experimental plots (Burgundy, France) along a hillslope were monitored weekly for 3 years for leaf water potentials, both at predawn (Ψpd) and at midday (Ψstem). The water stress experienced by grapevine was modeled as a function of meteorological data (minimum and maximum temperature, rainfall) and soil characteristics (soil texture, gravel content, slope) by a gradient boosting machine. Model performance was assessed by comparison with carbon isotope discrimination (δ(13)C) of grape sugars at harvest and by the use of a test-set. The developed models reached outstanding prediction performance (RMSE < 0.08 MPa for Ψstem and < 0.06 MPa for Ψpd), comparable to measurement accuracy. Model predictions at a daily time step improved correlation with δ(13)C data, respect to the observed trend at a weekly time scale. The role of each predictor in these models was described in order to understand how temperature, rainfall, soil texture, gravel content and slope affect the grapevine water status in the studied context. This work proposes a straight-forward strategy to simulate plant water stress in field condition, at a local scale; to investigate ecological relationships in the vineyard and adapt cultural practices to future conditions.

  11. Beyond where to how: a machine learning approach for sensing mobility contexts using smartphone sensors.

    Science.gov (United States)

    Guinness, Robert E

    2015-04-28

    This paper presents the results of research on the use of smartphone sensors (namely, GPS and accelerometers), geospatial information (points of interest, such as bus stops and train stations) and machine learning (ML) to sense mobility contexts. Our goal is to develop techniques to continuously and automatically detect a smartphone user's mobility activities, including walking, running, driving and using a bus or train, in real-time or near-real-time (feature selection process for a subset of algorithms, the performance was improved slightly. Furthermore, after tuning the parameters of RandomForest, performance improved to above 97.5%. Lastly, we measured the computational complexity of the classifiers, in terms of central processing unit (CPU) time needed for classification, to provide a rough comparison between the algorithms in terms of battery usage requirements. As a result, the classifiers can be ranked from lowest to highest complexity (i.e., computational cost) as follows: SVM, ANN, LR, BN, DT, NB, IBk, LWL and KStar. The instance-based classifiers take considerably more computational time than the non-instance-based classifiers, whereas the slowest non-instance-based classifier (NB) required about five-times the amount of CPU time as the fastest classifier (SVM). The above results suggest that DT algorithms are excellent candidates for detecting mobility contexts in smartphones, both in terms of performance and computational complexity.

  12. Hidden Markov models and other machine learning approaches in computational molecular biology

    Energy Technology Data Exchange (ETDEWEB)

    Baldi, P. [California Inst. of Tech., Pasadena, CA (United States)

    1995-12-31

    This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Computational tools are increasingly needed to process the massive amounts of data, to organize and classify sequences, to detect weak similarities, to separate coding from non-coding regions, and reconstruct the underlying evolutionary history. The fundamental problem in machine learning is the same as in scientific reasoning in general, as well as statistical modeling: to come up with a good model for the data. In this tutorial four classes of models are reviewed. They are: Hidden Markov models; artificial Neural Networks; Belief Networks; and Stochastic Grammars. When dealing with DNA and protein primary sequences, Hidden Markov models are one of the most flexible and powerful alignments and data base searches. In this tutorial, attention is focused on the theory of Hidden Markov Models, and how to apply them to problems in molecular biology.

  13. Machine learning approach to automatic exudate detection in retinal images from diabetic patients

    Science.gov (United States)

    Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin

    2010-01-01

    Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.

  14. Hybrid Forecasting Approach Based on GRNN Neural Network and SVR Machine for Electricity Demand Forecasting

    Directory of Open Access Journals (Sweden)

    Weide Li

    2017-01-01

    Full Text Available Accurate electric power demand forecasting plays a key role in electricity markets and power systems. The electric power demand is usually a non-linear problem due to various unknown reasons, which make it difficult to get accurate prediction by traditional methods. The purpose of this paper is to propose a novel hybrid forecasting method for managing and scheduling the electricity power. EEMD-SCGRNN-PSVR, the proposed new method, combines ensemble empirical mode decomposition (EEMD, seasonal adjustment (S, cross validation (C, general regression neural network (GRNN and support vector regression machine optimized by the particle swarm optimization algorithm (PSVR. The main idea of EEMD-SCGRNN-PSVR is respectively to forecast waveform and trend component that hidden in demand series to substitute directly forecasting original electric demand. EEMD-SCGRNN-PSVR is used to predict the one week ahead half-hour’s electricity demand in two data sets (New South Wales (NSW and Victorian State (VIC in Australia. Experimental results show that the new hybrid model outperforms the other three models in terms of forecasting accuracy and model robustness.

  15. Deep Appearance Models: A Deep Boltzmann Machine Approach for Face Modeling

    OpenAIRE

    Duong, Chi Nhan; Luu, Khoa; Quach, Kha Gia; Bui, Tien D.

    2016-01-01

    The "interpretation through synthesis" approach to analyze face images, particularly Active Appearance Models (AAMs) method, has become one of the most successful face modeling approaches over the last two decades. AAM models have ability to represent face images through synthesis using a controllable parameterized Principal Component Analysis (PCA) model. However, the accuracy and robustness of the synthesized faces of AAM are highly depended on the training sets and inherently on the genera...

  16. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    KAUST Repository

    McCabe, Matthew

    2017-12-06

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory ‘predictor’ variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association

  17. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    KAUST Repository

    McCabe, Matthew; McCabe, Matthew

    2017-01-01

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory ‘predictor’ variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association

  18. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning

    Science.gov (United States)

    Houborg, Rasmus; McCabe, Matthew F.

    2018-01-01

    With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory 'predictor' variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association with

  19. Upport vector machines for nonlinear kernel ARMA system identification.

    Science.gov (United States)

    Martínez-Ramón, Manel; Rojo-Alvarez, José Luis; Camps-Valls, Gustavo; Muñioz-Marí, Jordi; Navia-Vázquez, Angel; Soria-Olivas, Emilio; Figueiras-Vidal, Aníbal R

    2006-11-01

    Nonlinear system identification based on support vector machines (SVM) has been usually addressed by means of the standard SVM regression (SVR), which can be seen as an implicit nonlinear autoregressive and moving average (ARMA) model in some reproducing kernel Hilbert space (RKHS). The proposal of this letter is twofold. First, the explicit consideration of an ARMA model in an RKHS (SVM-ARMA2K) is proposed. We show that stating the ARMA equations in an RKHS leads to solving the regularized normal equations in that RKHS, in terms of the autocorrelation and cross correlation of the (nonlinearly) transformed input and output discrete time processes. Second, a general class of SVM-based system identification nonlinear models is presented, based on the use of composite Mercer's kernels. This general class can improve model flexibility by emphasizing the input-output cross information (SVM-ARMA4K), which leads to straightforward and natural combinations of implicit and explicit ARMA models (SVR-ARMA2K and SVR-ARMA4K). Capabilities of these different SVM-based system identification schemes are illustrated with two benchmark problems.

  20. An information theory based approach for quantitative evaluation of man-machine interface complexity

    International Nuclear Information System (INIS)

    Kang, Hyun Gook

    1999-02-01

    In complex and high-risk work conditions, especially such as in nuclear power plants, human understanding of the plant is highly cognitive and thus largely dependent on the effectiveness of the man-machine interface system. In order to provide more effective and reliable operating conditions for future nuclear power plants, developing more credible and easy to use evaluation methods will afford great help in designing interface systems in a more efficient manner. In this study, in order to analyze the human-machine interactions, I propose the Human-processor Communication(HPC) model which is based on the information flow concept. It identifies the information flow around a human-processor. Information flow has two aspects: appearance and content. Based on the HPC model, I propose two kinds of measures for evaluating a user interface from the viewpoint of these two aspects of information flow. They measure the communicative complexity of each aspect. In this study, for the evaluation of the aspect of appearance, I propose three complexity measures: Operation Complexity, Transition Complexity, and Screen Complexity. Each one of these measures has its own physical meaning. Two experiments carried out in this work support the utility of these measures. The result of the quiz game experiment shows that as the complexity of task context increases, the usage of the interface system becomes more complex. The experimental results of the three example systems(digital view, LDP style view and hierarchy view) show the utility of the proposed complexity measures. In this study, for the evaluation of the aspect of content, I propose the degree of informational coincidence, R (K, P) as a measure for the usefulness of an alarm-processing system. It is designed to perform user-oriented evaluation based on the informational entropy concept. It will be especially useful inearly design phase because designers can estimate the usefulness of an alarm system by short calculations instead

  1. An information theory based approach for quantitative evaluation of man-machine interface complexity

    Energy Technology Data Exchange (ETDEWEB)

    Kang, Hyun Gook

    1999-02-15

    In complex and high-risk work conditions, especially such as in nuclear power plants, human understanding of the plant is highly cognitive and thus largely dependent on the effectiveness of the man-machine interface system. In order to provide more effective and reliable operating conditions for future nuclear power plants, developing more credible and easy to use evaluation methods will afford great help in designing interface systems in a more efficient manner. In this study, in order to analyze the human-machine interactions, I propose the Human-processor Communication(HPC) model which is based on the information flow concept. It identifies the information flow around a human-processor. Information flow has two aspects: appearance and content. Based on the HPC model, I propose two kinds of measures for evaluating a user interface from the viewpoint of these two aspects of information flow. They measure the communicative complexity of each aspect. In this study, for the evaluation of the aspect of appearance, I propose three complexity measures: Operation Complexity, Transition Complexity, and Screen Complexity. Each one of these measures has its own physical meaning. Two experiments carried out in this work support the utility of these measures. The result of the quiz game experiment shows that as the complexity of task context increases, the usage of the interface system becomes more complex. The experimental results of the three example systems(digital view, LDP style view and hierarchy view) show the utility of the proposed complexity measures. In this study, for the evaluation of the aspect of content, I propose the degree of informational coincidence, R (K, P) as a measure for the usefulness of an alarm-processing system. It is designed to perform user-oriented evaluation based on the informational entropy concept. It will be especially useful inearly design phase because designers can estimate the usefulness of an alarm system by short calculations instead

  2. A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

    Science.gov (United States)

    Carlson, Joel N. K.; Park, Jong Min; Park, So-Yeon; In Park, Jong; Choi, Yunseok; Ye, Sung-Joon

    2016-03-01

    Discrepancies between planned and delivered movements of multi-leaf collimators (MLCs) are an important source of errors in dose distributions during radiotherapy. In this work we used machine learning techniques to train models to predict these discrepancies, assessed the accuracy of the model predictions, and examined the impact these errors have on quality assurance (QA) procedures and dosimetry. Predictive leaf motion parameters for the models were calculated from the plan files, such as leaf position and velocity, whether the leaf was moving towards or away from the isocenter of the MLC, and many others. Differences in positions between synchronized DICOM-RT planning files and DynaLog files reported during QA delivery were used as a target response for training of the models. The final model is capable of predicting MLC positions during delivery to a high degree of accuracy. For moving MLC leaves, predicted positions were shown to be significantly closer to delivered positions than were planned positions. By incorporating predicted positions into dose calculations in the TPS, increases were shown in gamma passing rates against measured dose distributions recorded during QA delivery. For instance, head and neck plans with 1%/2 mm gamma criteria had an average increase in passing rate of 4.17% (SD  =  1.54%). This indicates that the inclusion of predictions during dose calculation leads to a more realistic representation of plan delivery. To assess impact on the patient, dose volumetric histograms (DVH) using delivered positions were calculated for comparison with planned and predicted DVHs. In all cases, predicted dose volumetric parameters were in closer agreement to the delivered parameters than were the planned parameters, particularly for organs at risk on the periphery of the treatment area. By incorporating the predicted positions into the TPS, the treatment planner is given a more realistic view of the dose distribution as it will truly be

  3. Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.

    Science.gov (United States)

    Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa

    2014-06-01

    Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. Copyright © 2013 Elsevier Ltd. All rights reserved.

  4. Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

    International Nuclear Information System (INIS)

    Roche-Lima, Abiel; Thulasiram, Ruppa K

    2012-01-01

    Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.

  5. "What is relevant in a text document?": An interpretable machine learning approach.

    Directory of Open Access Journals (Sweden)

    Leila Arras

    Full Text Available Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text's category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP, a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

  6. Cross-trial prediction of treatment outcome in depression: a machine learning approach.

    Science.gov (United States)

    Chekroud, Adam Mourad; Zotti, Ryan Joseph; Shehzad, Zarrar; Gueorguieva, Ralitza; Johnson, Marcia K; Trivedi, Madhukar H; Cannon, Tyrone D; Krystal, John Harrison; Corlett, Philip Robert

    2016-03-01

    Antidepressant treatment efficacy is low, but might be improved by matching patients to interventions. At present, clinicians have no empirically validated mechanisms to assess whether a patient with depression will respond to a specific antidepressant. We aimed to develop an algorithm to assess whether patients will achieve symptomatic remission from a 12-week course of citalopram. We used patient-reported data from patients with depression (n=4041, with 1949 completers) from level 1 of the Sequenced Treatment Alternatives to Relieve Depression (STAR*D; ClinicalTrials.gov, number NCT00021528) to identify variables that were most predictive of treatment outcome, and used these variables to train a machine-learning model to predict clinical remission. We externally validated the model in the escitalopram treatment group (n=151) of an independent clinical trial (Combining Medications to Enhance Depression Outcomes [COMED]; ClinicalTrials.gov, number NCT00590863). We identified 25 variables that were most predictive of treatment outcome from 164 patient-reportable variables, and used these to train the model. The model was internally cross-validated, and predicted outcomes in the STAR*D cohort with accuracy significantly above chance (64·6% [SD 3·2]; p<0·0001). The model was externally validated in the escitalopram treatment group (N=151) of COMED (accuracy 59·6%, p=0.043). The model also performed significantly above chance in a combined escitalopram-buproprion treatment group in COMED (n=134; accuracy 59·7%, p=0·023), but not in a combined venlafaxine-mirtazapine group (n=140; accuracy 51·4%, p=0·53), suggesting specificity of the model to underlying mechanisms. Building statistical models by mining existing clinical trial data can enable prospective identification of patients who are likely to respond to a specific antidepressant. Yale University. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Separating depressive comorbidity from panic disorder: A combined functional magnetic resonance imaging and machine learning approach.

    Science.gov (United States)

    Lueken, Ulrike; Straube, Benjamin; Yang, Yunbo; Hahn, Tim; Beesdo-Baum, Katja; Wittchen, Hans-Ulrich; Konrad, Carsten; Ströhle, Andreas; Wittmann, André; Gerlach, Alexander L; Pfleiderer, Bettina; Arolt, Volker; Kircher, Tilo

    2015-09-15

    Depression is frequent in panic disorder (PD); yet, little is known about its influence on the neural substrates of PD. Difficulties in fear inhibition during safety signal processing have been reported as a pathophysiological feature of PD that is attenuated by depression. We investigated the impact of comorbid depression in PD with agoraphobia (AG) on the neural correlates of fear conditioning and the potential of machine learning to predict comorbidity status on the individual patient level based on neural characteristics. Fifty-nine PD/AG patients including 26 (44%) with a comorbid depressive disorder (PD/AG+DEP) underwent functional magnetic resonance imaging (fMRI). Comorbidity status was predicted using a random undersampling tree ensemble in a leave-one-out cross-validation framework. PD/AG-DEP patients showed altered neural activation during safety signal processing, while +DEP patients exhibited generally decreased dorsolateral prefrontal and insular activation. Comorbidity status was correctly predicted in 79% of patients (sensitivity: 73%; specificity: 85%) based on brain activation during fear conditioning (corrected for potential confounders: accuracy: 73%; sensitivity: 77%; specificity: 70%). No primary depressed patients were available; only medication-free patients were included. Major depression and dysthymia were collapsed (power considerations). Neurofunctional activation during safety signal processing differed between patients with or without comorbid depression, a finding which may explain heterogeneous results across previous studies. These findings demonstrate the relevance of comorbidity when investigating neurofunctional substrates of anxiety disorders. Predicting individual comorbidity status may translate neurofunctional data into clinically relevant information which might aid in planning individualized treatment. The study was registered with the ISRCTN80046034. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. A machine learning approach for real-time modelling of tissue deformation in image-guided neurosurgery.

    Science.gov (United States)

    Tonutti, Michele; Gras, Gauthier; Yang, Guang-Zhong

    2017-07-01

    Accurate reconstruction and visualisation of soft tissue deformation in real time is crucial in image-guided surgery, particularly in augmented reality (AR) applications. Current deformation models are characterised by a trade-off between accuracy and computational speed. We propose an approach to derive a patient-specific deformation model for brain pathologies by combining the results of pre-computed finite element method (FEM) simulations with machine learning algorithms. The models can be computed instantaneously and offer an accuracy comparable to FEM models. A brain tumour is used as the subject of the deformation model. Load-driven FEM simulations are performed on a tetrahedral brain mesh afflicted by a tumour. Forces of varying magnitudes, positions, and inclination angles are applied onto the brain's surface. Two machine learning algorithms-artificial neural networks (ANNs) and support vector regression (SVR)-are employed to derive a model that can predict the resulting deformation for each node in the tumour's mesh. The tumour deformation can be predicted in real time given relevant information about the geometry of the anatomy and the load, all of which can be measured instantly during a surgical operation. The models can predict the position of the nodes with errors below 0.3mm, beyond the general threshold of surgical accuracy and suitable for high fidelity AR systems. The SVR models perform better than the ANN's, with positional errors for SVR models reaching under 0.2mm. The results represent an improvement over existing deformation models for real time applications, providing smaller errors and high patient-specificity. The proposed approach addresses the current needs of image-guided surgical systems and has the potential to be employed to model the deformation of any type of soft tissue. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Machine medical ethics

    CERN Document Server

    Pontier, Matthijs

    2015-01-01

    The essays in this book, written by researchers from both humanities and sciences, describe various theoretical and experimental approaches to adding medical ethics to a machine in medical settings. Medical machines are in close proximity with human beings, and getting closer: with patients who are in vulnerable states of health, who have disabilities of various kinds, with the very young or very old, and with medical professionals. In such contexts, machines are undertaking important medical tasks that require emotional sensitivity, knowledge of medical codes, human dignity, and privacy. As machine technology advances, ethical concerns become more urgent: should medical machines be programmed to follow a code of medical ethics? What theory or theories should constrain medical machine conduct? What design features are required? Should machines share responsibility with humans for the ethical consequences of medical actions? How ought clinical relationships involving machines to be modeled? Is a capacity for e...

  10. CFD modeling of two-stage ignition in a rapid compression machine: Assessment of zero-dimensional approach

    Energy Technology Data Exchange (ETDEWEB)

    Mittal, Gaurav [Department of Mechanical Engineering, The University of Akron, Akron, OH 44325 (United States); Raju, Mandhapati P. [General Motor R and D Tech Center, Warren, MI 48090 (United States); Sung, Chih-Jen [Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269 (United States)

    2010-07-15

    In modeling rapid compression machine (RCM) experiments, zero-dimensional approach is commonly used along with an associated heat loss model. The adequacy of such approach has not been validated for hydrocarbon fuels. The existence of multi-dimensional effects inside an RCM due to the boundary layer, roll-up vortex, non-uniform heat release, and piston crevice could result in deviation from the zero-dimensional assumption, particularly for hydrocarbons exhibiting two-stage ignition and strong thermokinetic interactions. The objective of this investigation is to assess the adequacy of zero-dimensional approach in modeling RCM experiments under conditions of two-stage ignition and negative temperature coefficient (NTC) response. Computational fluid dynamics simulations are conducted for n-heptane ignition in an RCM and the validity of zero-dimensional approach is assessed through comparisons over the entire NTC region. Results show that the zero-dimensional model based on the approach of 'adiabatic volume expansion' performs very well in adequately predicting the first-stage ignition delays, although quantitative discrepancy for the prediction of the total ignition delays and pressure rise in the first-stage ignition is noted even when the roll-up vortex is suppressed and a well-defined homogeneous core is retained within an RCM. Furthermore, the discrepancy is pressure dependent and decreases as compressed pressure is increased. Also, as ignition response becomes single-stage at higher compressed temperatures, discrepancy from the zero-dimensional simulations reduces. Despite of some quantitative discrepancy, the zero-dimensional modeling approach is deemed satisfactory from the viewpoint of the ignition delay simulation. (author)

  11. What Would a Graph Look Like in this Layout? A Machine Learning Approach to Large Graph Visualization.

    Science.gov (United States)

    Kwon, Oh-Hyun; Crnovrsanin, Tarik; Ma, Kwan-Liu

    2018-01-01

    Using different methods for laying out a graph can lead to very different visual appearances, with which the viewer perceives different information. Selecting a "good" layout method is thus important for visualizing a graph. The selection can be highly subjective and dependent on the given task. A common approach to selecting a good layout is to use aesthetic criteria and visual inspection. However, fully calculating various layouts and their associated aesthetic metrics is computationally expensive. In this paper, we present a machine learning approach to large graph visualization based on computing the topological similarity of graphs using graph kernels. For a given graph, our approach can show what the graph would look like in different layouts and estimate their corresponding aesthetic metrics. An important contribution of our work is the development of a new framework to design graph kernels. Our experimental study shows that our estimation calculation is considerably faster than computing the actual layouts and their aesthetic metrics. Also, our graph kernels outperform the state-of-the-art ones in both time and accuracy. In addition, we conducted a user study to demonstrate that the topological similarity computed with our graph kernel matches perceptual similarity assessed by human users.

  12. Multi-phase classification by a least-squares support vector machine approach in tomography images of geological samples

    Science.gov (United States)

    Khan, Faisal; Enzmann, Frieder; Kersten, Michael

    2016-03-01

    Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.

  13. A UNIFIED APPROACH FOR DETECTION AND PREVENTION OF DDOS ATTACKS USING ENHANCED SUPPORT VECTOR MACHINES AND FILTERING MECHANISMS

    Directory of Open Access Journals (Sweden)

    T. Subbulakshmi

    2014-10-01

    Full Text Available Distributed Denial of Service (DDoS attacks were considered to be a tremendous threat to the current information security infrastructure. During DDoS attack, multiple malicious hosts that are recruited by the attackers launch a coordinated attack against one host or a network victim, which cause denial of service to legitimate users. The existing techniques suffer from more number of false alarms and more human intervention for attack detection. The objective of this paper is to monitor the network online which automatically initiates detection mechanism if there is any suspicious activity and also defense the hosts from being arrived at the network. Both spoofed and non spoofed IP’s are detected in this approach. Non spoofed IP’s are detected using Enhanced Support Vector Machines (ESVM and spoofed IP’s are detected using Hop Count Filtering (HCF mechanism. The detected IP’s are maintained separately to initiate the defense process. The attack strength is calculated using Lanchester Law which initiates the defense mechanism. Based on the calculated attack strength any of the defense schemes such as Rate based limiting or History based IP filtering is automatically initiated to drop the packets from the suspected IP. The integrated online monitoring approach for detection and defense of DDoS attacks is deployed in an experimental testbed. The online approach is found to be obvious in the field of integrated DDoS detection and defense.

  14. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches.

    Science.gov (United States)

    Vural, Suleyman; Wang, Xiaosheng; Guda, Chittibabu

    2016-08-26

    The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers. We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients. This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable

  15. A machine learning approach for efficient uncertainty quantification using multiscale methods

    Science.gov (United States)

    Chan, Shing; Elsheikh, Ahmed H.

    2018-02-01

    Several multiscale methods account for sub-grid scale features using coarse scale basis functions. For example, in the Multiscale Finite Volume method the coarse scale basis functions are obtained by solving a set of local problems over dual-grid cells. We introduce a data-driven approach for the estimation of these coarse scale basis functions. Specifically, we employ a neural network predictor fitted using a set of solution samples from which it learns to generate subsequent basis functions at a lower computational cost than solving the local problems. The computational advantage of this approach is realized for uncertainty quantification tasks where a large number of realizations has to be evaluated. We attribute the ability to learn these basis functions to the modularity of the local problems and the redundancy of the permeability patches between samples. The proposed method is evaluated on elliptic problems yielding very promising results.

  16. A systems approach for data compression and latency reduction in cortically controlled brain machine interfaces.

    Science.gov (United States)

    Oweiss, Karim G

    2006-07-01

    This paper suggests a new approach for data compression during extracutaneous transmission of neural signals recorded by high-density microelectrode array in the cortex. The approach is based on exploiting the temporal and spatial characteristics of the neural recordings in order to strip the redundancy and infer the useful information early in the data stream. The proposed signal processing algorithms augment current filtering and amplification capability and may be a viable replacement to on chip spike detection and sorting currently employed to remedy the bandwidth limitations. Temporal processing is devised by exploiting the sparseness capabilities of the discrete wavelet transform, while spatial processing exploits the reduction in the number of physical channels through quasi-periodic eigendecomposition of the data covariance matrix. Our results demonstrate that substantial improvements are obtained in terms of lower transmission bandwidth, reduced latency and optimized processor utilization. We also demonstrate the improvements qualitatively in terms of superior denoising capabilities and higher fidelity of the obtained signals.

  17. A New Approach in Downscaling Microwave Soil Moisture Product using Machine Learning

    Science.gov (United States)

    Abbaszadeh, Peyman; Yan, Hongxiang; Moradkhani, Hamid

    2016-04-01

    Understating the soil moisture pattern has significant impact on flood modeling, drought monitoring, and irrigation management. Although satellite retrievals can provide an unprecedented spatial and temporal resolution of soil moisture at a global-scale, their soil moisture products (with a spatial resolution of 25-50 km) are inadequate for regional study, where a resolution of 1-10 km is needed. In this study, a downscaling approach using Genetic Programming (GP), a specialized version of Genetic Algorithm (GA), is proposed to improve the spatial resolution of satellite soil moisture products. The GP approach was applied over a test watershed in United States using the coarse resolution satellite data (25 km) from Advanced Microwave Scanning Radiometer - EOS (AMSR-E) soil moisture products, the fine resolution data (1 km) from Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation index, and ground based data including land surface temperature, vegetation and other potential physical variables. The results indicated the great potential of this approach to derive the fine resolution soil moisture information applicable for data assimilation and other regional studies.

  18. A Machine Learning Approach to Estimate Riverbank Geotechnical Parameters from Sediment Particle Size Data

    Science.gov (United States)

    Iwashita, Fabio; Brooks, Andrew; Spencer, John; Borombovits, Daniel; Curwen, Graeme; Olley, Jon

    2015-04-01

    Assessing bank stability using geotechnical models traditionally involves the laborious collection of data on the bank and floodplain stratigraphy, as well as in-situ geotechnical data for each sedimentary unit within a river bank. The application of geotechnical bank stability models are limited to those sites where extensive field data has been collected, where their ability to provide predictions of bank erosion at the reach scale are limited without a very extensive and expensive field data collection program. Some challenges in the construction and application of riverbank erosion and hydraulic numerical models are their one-dimensionality, steady-state requirements, lack of calibration data, and nonuniqueness. Also, numerical models commonly can be too rigid with respect to detecting unexpected features like the onset of trends, non-linear relations, or patterns restricted to sub-samples of a data set. These shortcomings create the need for an alternate modelling approach capable of using available data. The application of the Self-Organizing Maps (SOM) approach is well-suited to the analysis of noisy, sparse, nonlinear, multidimensional, and scale-dependent data. It is a type of unsupervised artificial neural network with hybrid competitive-cooperative learning. In this work we present a method that uses a database of geotechnical data collected at over 100 sites throughout Queensland State, Australia, to develop a modelling approach that enables geotechnical parameters (soil effective cohesion, friction angle, soil erodibility and critical stress) to be derived from sediment particle size data (PSD). The model framework and predicted values were evaluated using two methods, splitting the dataset into training and validation set, and through a Bootstrap approach. The basis of Bootstrap cross-validation is a leave-one-out strategy. This requires leaving one data value out of the training set while creating a new SOM to estimate that missing value based on the

  19. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    Science.gov (United States)

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  20. Detection of Convective Initiation Using Meteorological Imager Onboard Communication, Ocean, and Meteorological Satellite Based on Machine Learning Approaches

    Directory of Open Access Journals (Sweden)

    Hyangsun Han

    2015-07-01

    Full Text Available As convective clouds in Northeast Asia are accompanied by various hazards related with heavy rainfall and thunderstorms, it is very important to detect convective initiation (CI in the region in order to mitigate damage by such hazards. In this study, a novel approach for CI detection using images from Meteorological Imager (MI, a payload of the Communication, Ocean, and Meteorological Satellite (COMS, was developed by improving the criteria of the interest fields of Rapidly Developing Cumulus Areas (RDCA derivation algorithm, an official CI detection algorithm for Multi-functional Transport SATellite-2 (MTSAT-2, based on three machine learning approaches—decision trees (DT, random forest (RF, and support vector machines (SVM. CI was defined as clouds within a 16 × 16 km window with the first detection of lightning occurrence at the center. A total of nine interest fields derived from visible, water vapor, and two thermal infrared images of MI obtained 15–75 min before the lightning occurrence were used as input variables for CI detection. RF produced slightly higher performance (probability of detection (POD of 75.5% and false alarm rate (FAR of 46.2% than DT (POD of 70.7% and FAR of 46.6% for detection of CI caused by migrating frontal cyclones and unstable atmosphere. SVM resulted in relatively poor performance with very high FAR ~83.3%. The averaged lead times of CI detection based on the DT and RF models were 36.8 and 37.7 min, respectively. This implies that CI over Northeast Asia can be forecasted ~30–45 min in advance using COMS MI data.

  1. A machine learning approach using EEG data to predict response to SSRI treatment for major depressive disorder.

    Science.gov (United States)

    Khodayari-Rostamabad, Ahmad; Reilly, James P; Hasey, Gary M; de Bruin, Hubert; Maccrimmon, Duncan J

    2013-10-01

    The problem of identifying, in advance, the most effective treatment agent for various psychiatric conditions remains an elusive goal. To address this challenge, we investigate the performance of the proposed machine learning (ML) methodology (based on the pre-treatment electroencephalogram (EEG)) for prediction of response to treatment with a selective serotonin reuptake inhibitor (SSRI) medication in subjects suffering from major depressive disorder (MDD). A relatively small number of most discriminating features are selected from a large group of candidate features extracted from the subject's pre-treatment EEG, using a machine learning procedure for feature selection. The selected features are fed into a classifier, which was realized as a mixture of factor analysis (MFA) model, whose output is the predicted response in the form of a likelihood value. This likelihood indicates the extent to which the subject belongs to the responder vs. non-responder classes. The overall method was evaluated using a "leave-n-out" randomized permutation cross-validation procedure. A list of discriminating EEG biomarkers (features) was found. The specificity of the proposed method is 80.9% while sensitivity is 94.9%, for an overall prediction accuracy of 87.9%. There is a 98.76% confidence that the estimated prediction rate is within the interval [75%, 100%]. These results indicate that the proposed ML method holds considerable promise in predicting the efficacy of SSRI antidepressant therapy for MDD, based on a simple and cost-effective pre-treatment EEG. The proposed approach offers the potential to improve the treatment of major depression and to reduce health care costs. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  2. A novel machine learning approach for estimation of electricity demand: An empirical evidence from Thailand

    International Nuclear Information System (INIS)

    Mostafavi, Elham Sadat; Mostafavi, Seyyed Iman; Jaafari, Arefeh; Hosseinpour, Fariba

    2013-01-01

    Highlights: • A hybrid approach is presented for the estimation of the electricity demand. • The proposed method integrates the capabilities of GP and SA. • The GSA model makes accurate predictions of the electricity demand. - Abstract: This study proposes an innovative hybrid approach for the estimation of the long-term electricity demand. A new prediction equation was developed for the electricity demand using an integrated search method of genetic programming and simulated annealing, called GSA. The annual electricity demand was formulated in terms of population, gross domestic product (GDP), stock index, and total revenue from exporting industrial products of the same year. A comprehensive database containing total electricity demand in Thailand from 1986 to 2009 was used to develop the model. The generalization of the model was verified using a separate testing data. A sensitivity analysis was conducted to investigate the contribution of the parameters affecting the electricity demand. The GSA model provides accurate predictions of the electricity demand. Furthermore, the proposed model outperforms a regression and artificial neural network-based models

  3. Clustering of the Self-Organizing Map based Approach in Induction Machine Rotor Faults Diagnostics

    Directory of Open Access Journals (Sweden)

    Ahmed TOUMI

    2009-12-01

    Full Text Available Self-Organizing Maps (SOM is an excellent method of analyzingmultidimensional data. The SOM based classification is attractive, due to itsunsupervised learning and topology preserving properties. In this paper, theperformance of the self-organizing methods is investigated in induction motorrotor fault detection and severity evaluation. The SOM is based on motor currentsignature analysis (MCSA. The agglomerative hierarchical algorithms using theWard’s method is applied to automatically dividing the map into interestinginterpretable groups of map units that correspond to clusters in the input data. Theresults obtained with this approach make it possible to detect a rotor bar fault justdirectly from the visualization results. The system is also able to estimate theextent of rotor faults.

  4. Detection of small surface defects using DCT based enhancement approach in machine vision systems

    Science.gov (United States)

    He, Fuqiang; Wang, Wen; Chen, Zichen

    2005-12-01

    Utilizing DCT based enhancement approach, an improved small defect detection algorithm for real-time leather surface inspection was developed. A two-stage decomposition procedure was proposed to extract an odd-odd frequency matrix after a digital image has been transformed to DCT domain. Then, the reverse cumulative sum algorithm was proposed to detect the transition points of the gentle curves plotted from the odd-odd frequency matrix. The best radius of the cutting sector was computed in terms of the transition points and the high-pass filtering operation was implemented. The filtered image was then inversed and transformed back to the spatial domain. Finally, the restored image was segmented by an entropy method and some defect features are calculated. Experimental results show the proposed small defect detection method can reach the small defect detection rate by 94%.

  5. Machine learning approach to detect intruders in database based on hexplet data structure

    Directory of Open Access Journals (Sweden)

    Saad M. Darwish

    2016-09-01

    Full Text Available Most of valuable information resources for any organization are stored in the database; it is a serious subject to protect this information against intruders. However, conventional security mechanisms are not designed to detect anomalous actions of database users. An intrusion detection system (IDS, delivers an extra layer of security that cannot be guaranteed by built-in security tools, is the ideal solution to defend databases from intruders. This paper suggests an anomaly detection approach that summarizes the raw transactional SQL queries into a compact data structure called hexplet, which can model normal database access behavior (abstract the user's profile and recognize impostors specifically tailored for role-based access control (RBAC database system. This hexplet lets us to preserve the correlation among SQL statements in the same transaction by exploiting the information in the transaction-log entry with the aim to improve detection accuracy specially those inside the organization and behave strange behavior. The model utilizes naive Bayes classifier (NBC as the simplest supervised learning technique for creating profiles and evaluating the legitimacy of a transaction. Experimental results show the performance of the proposed model in the term of detection rate.

  6. Towards Designing Graceful Degradation into Trajectory Based Operations: A Human-Machine System Integration Approach

    Science.gov (United States)

    Edwards, Tamsyn; Lee, Paul

    2017-01-01

    One of the most fundamental changes to the air traffic management system in NextGen is the concept of trajectory based operations (TBO). With the introduction of such change, system safety and resilience is a critical concern, in particular, the ability of systems to gracefully degrade. In order to design graceful degradation into a TBO envrionment, knowledge of the potential causes of degradation, and appropriate solutions, is required. In addition, previous research has predominantly explored the technological contribution to graceful degradation, frequently neglecting to consider the role of the human operator, specifically, air traffic controllers (ATCOs). This is out of step with real-world operations, and potentially limits an ecologically valid understanding of achieving graceful degradation in an air traffic control (ATC) environment. The following literature review aims to identify and summarize the literature to date on the potential causes of degradation in ATC and the solutions that may be applied within a TBO context, with a specific focus on the contribution of the air traffic controller. A framework of graceful degradation, developed from the literature, is presented. It is argued that in order to achieve graceful degradation within TBO, a human-system integration approach must be applied.

  7. Towards a Virtual Machine Approach to Resilient and Safe Mobile Robots

    DEFF Research Database (Denmark)

    Adam, Marian Sorin; Kuhrmann, Marco; Schultz, Ulrik Pagh

    2016-01-01

    Mobile robots are advanced systems that often need to operate in unstructured environments, which raises the complexity of the software. Many components are important for the overall reliability and safety of the robot, but reducing the risk of errors by making the software resilient is both comp...... switching between different implementations of the non-critical parts of the system. We report on the overall architecture and implementation, and demonstrate our approach using software for a commercial mobile robot.......Mobile robots are advanced systems that often need to operate in unstructured environments, which raises the complexity of the software. Many components are important for the overall reliability and safety of the robot, but reducing the risk of errors by making the software resilient is both...... complicated and expensive. Nevertheless, a commercially successful robot often has to remain safe while maintaining as much as possible from the required functionality, even in the presence of partial failures. In this paper, we propose a flexible way to improve the reliability of existing robot software...

  8. Multi-parameter machine learning approach to the neuroanatomical basis of developmental dyslexia.

    Science.gov (United States)

    Płoński, Piotr; Gradkowski, Wojciech; Altarelli, Irene; Monzalvo, Karla; van Ermingen-Marbach, Muna; Grande, Marion; Heim, Stefan; Marchewka, Artur; Bogorodzki, Piotr; Ramus, Franck; Jednoróg, Katarzyna

    2017-02-01

    Despite decades of research, the anatomical abnormalities associated with developmental dyslexia are still not fully described. Studies have focused on between-group comparisons in which different neuroanatomical measures were generally explored in isolation, disregarding potential interactions between regions and measures. Here, for the first time a multivariate classification approach was used to investigate grey matter disruptions in children with dyslexia in a large (N = 236) multisite sample. A variety of cortical morphological features, including volumetric (volume, thickness and area) and geometric (folding index and mean curvature) measures were taken into account and generalizability of classification was assessed with both 10-fold and leave-one-out cross validation (LOOCV) techniques. Classification into control vs. dyslexic subjects achieved above chance accuracy (AUC = 0.66 and ACC = 0.65 in the case of 10-fold CV, and AUC = 0.65 and ACC = 0.64 using LOOCV) after principled feature selection. Features that discriminated between dyslexic and control children were exclusively situated in the left hemisphere including superior and middle temporal gyri, subparietal sulcus and prefrontal areas. They were related to geometric properties of the cortex, with generally higher mean curvature and a greater folding index characterizing the dyslexic group. Our results support the hypothesis that an atypical curvature pattern with extra folds in left hemispheric perisylvian regions characterizes dyslexia. Hum Brain Mapp 38:900-908, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Machine learning approaches to evaluate correlation patterns in allosteric signaling: A case study of the PDZ2 domain

    Science.gov (United States)

    Botlani, Mohsen; Siddiqui, Ahnaf; Varma, Sameer

    2018-06-01

    Many proteins are regulated by dynamic allostery wherein regulator-induced changes in structure are comparable with thermal fluctuations. Consequently, understanding their mechanisms requires assessment of relationships between and within conformational ensembles of different states. Here we show how machine learning based approaches can be used to simplify this high-dimensional data mining task and also obtain mechanistic insight. In particular, we use these approaches to investigate two fundamental questions in dynamic allostery. First, how do regulators modify inter-site correlations in conformational fluctuations (Cij)? Second, how are regulator-induced shifts in conformational ensembles at two different sites in a protein related to each other? We address these questions in the context of the human protein tyrosine phosphatase 1E's PDZ2 domain, which is a model protein for studying dynamic allostery. We use molecular dynamics to generate conformational ensembles of the PDZ2 domain in both the regulator-bound and regulator-free states. The employed protocol reproduces methyl deuterium order parameters from NMR. Results from unsupervised clustering of Cij combined with flow analyses of weighted graphs of Cij show that regulator binding significantly alters the global signaling network in the protein; however, not by altering the spatial arrangement of strongly interacting amino acid clusters but by modifying the connectivity between clusters. Additionally, we find that regulator-induced shifts in conformational ensembles, which we evaluate by repartitioning ensembles using supervised learning, are, in fact, correlated. This correlation Δij is less extensive compared to Cij, but in contrast to Cij, Δij depends inversely on the distance from the regulator binding site. Assuming that Δij is an indicator of the transduction of the regulatory signal leads to the conclusion that the regulatory signal weakens with distance from the regulatory site. Overall, this

  10. Machine learning-based Landsat-MODIS data fusion approach for 8-day 30m evapotranspiration monitoring

    Science.gov (United States)

    Im, J.; Ke, Y.; Park, S.

    2016-12-01

    Continuous monitoring of evapotranspiration (ET) is important for understanding of hydrological cycles and energy flux dynamics. At regional and local scales, routine ET estimation is a critical for efficient water management, drought impact assessment and ecosystem health monitoring, etc. Remote sensing has long been recognized to be able to provide ET monitoring over large areas. However, no single satellite could provide temporally continuous ET at relatively high spatial resolution due to the trade-off between the spatial and temporal resolution of current satellite sensors. Landsat-series satellites provide optical and thermal imagery at 30-100m resolution, whereas the 16-day revisit cycle hinders the observation of ET dynamics; MODIS provides sources of ET estimation at daily basis, but the 500-1000m ground sampling distance is too coarse for field level applications. In this study, we present a machine learning and STARFM based method for Landsat/MODIS ET fusion. The approach first downscales MODIS 8-day 1km ET (MOD16A2) to 30m based on eleven Landsat-derived indicators such as NDVI, EVI, NDWI etc on the cloud-free Landsat-available days using Random Forest approach. For the days when Landsat data are not available, downscaled ET is synthesized by MODIS and Landsat data fusion with STARFM and STI-FM approaches. The models are evaluated using in situ flux tower measurements at US-ARM and US-Twt AmeriFlux sites the United States. Results show that the downscaled 30m ET have good agreement with MODIS ET (RMSE=0.42-3.4mm/8days, rRMSE=3.2%-26%) and the downscaled ET have higher accuracy than MODIS ET when compared to in-situ measurements.

  11. Predicting Hepatotoxicity of Drug Metabolites Via an Ensemble Approach Based on Support Vector Machine

    Science.gov (United States)

    Lu, Yin; Liu, Lili; Lu, Dong; Cai, Yudong; Zheng, Mingyue; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2017-11-20

    Drug-induced liver injury (DILI) is a major cause of drug withdrawal. The chemical properties of the drug, especially drug metabolites, play key roles in DILI. Our goal is to construct a QSAR model to predict drug hepatotoxicity based on drug metabolites. 64 hepatotoxic drug metabolites and 3,339 non-hepatotoxic drug metabolites were gathered from MDL Metabolite Database. Considering the imbalance of the dataset, we randomly split the negative samples and combined each portion with all the positive samples to construct individually balanced datasets for constructing independent classifiers. Then, we adopted an ensemble approach to make prediction based on the results of all individual classifiers and applied the minimum Redundancy Maximum Relevance (mRMR) feature selection method to select the molecular descriptors. Eventually, for the drugs in the external test set, a Bayesian inference method was used to predict the hepatotoxicity of a drug based on its metabolites. The model showed the average balanced accuracy=78.47%, sensitivity =74.17%, and specificity=82.77%. Five molecular descriptors characterizing molecular polarity, intramolecular bonding strength, and molecular frontier orbital energy were obtained. When predicting the hepatotoxicity of a drug based on all its metabolites, the sensitivity, specificity and balanced accuracy were 60.38%, 70.00%, and 65.19%, respectively, indicating that this method is useful for identifying the hepatotoxicity of drugs. We developed an in silico model to predict hepatotoxicity of drug metabolites. Moreover, Bayesian inference was applied to predict the hepatotoxicity of a drug based on its metabolites which brought out valuable high sensitivity and specificity. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  12. Models of evaluating efficiency and risks on integration of cloud-base IT-services of the machine-building enterprise: a system approach

    Science.gov (United States)

    Razumnikov, S.; Kurmanbay, A.

    2016-04-01

    The present paper suggests a system approach to evaluation of the effectiveness and risks resulted from the integration of cloud-based services in a machine-building enterprise. This approach makes it possible to estimate a set of enterprise IT applications and choose the applications to be migrated to the cloud with regard to specific business requirements, a technological strategy and willingness to risk.

  13. Machine Shop Grinding Machines.

    Science.gov (United States)

    Dunn, James

    This curriculum manual is one in a series of machine shop curriculum manuals intended for use in full-time secondary and postsecondary classes, as well as part-time adult classes. The curriculum can also be adapted to open-entry, open-exit programs. Its purpose is to equip students with basic knowledge and skills that will enable them to enter the…

  14. A Novel Approach for Multi Class Fault Diagnosis in Induction Machine Based on Statistical Time Features and Random Forest Classifier

    Science.gov (United States)

    Sonje, M. Deepak; Kundu, P.; Chowdhury, A.

    2017-08-01

    Fault diagnosis and detection is the important area in health monitoring of electrical machines. This paper proposes the recently developed machine learning classifier for multi class fault diagnosis in induction machine. The classification is based on random forest (RF) algorithm. Initially, stator currents are acquired from the induction machine under various conditions. After preprocessing the currents, fourteen statistical time features are estimated for each phase of the current. These parameters are considered as inputs to the classifier. The main scope of the paper is to evaluate effectiveness of RF classifier for individual and mixed fault diagnosis in induction machine. The stator, rotor and mixed faults (stator and rotor faults) are classified using the proposed classifier. The obtained performance measures are compared with the multilayer perceptron neural network (MLPNN) classifier. The results show the much better performance measures and more accurate than MLPNN classifier. For demonstration of planned fault diagnosis algorithm, experimentally obtained results are considered to build the classifier more practical.

  15. Identifying ecological "sweet spots" underlying cyanobacteria functional group dynamics from long-term observations using a statistical machine learning approach

    Science.gov (United States)

    Nelson, N.; Munoz-Carpena, R.; Phlips, E. J.

    2017-12-01

    Diversity in the eco-physiological adaptations of cyanobacteria genera creates challenges for water managers who are tasked with developing appropriate actions for controlling not only the intensity and frequency of cyanobacteria blooms, but also reducing the potential for blooms of harmful taxa (e.g., toxin producers, N2 fixers). Compounding these challenges, the efficacy of nutrient management strategies (phosphorus-only versus nitrogen-and-phosphorus) for cyanobacteria bloom abatement is the subject of an ongoing debate, which increases uncertainty associated with bloom mitigation decision-making. In this work, we analyze a unique long-term (17-year) dataset composed of monthly observations of cyanobacteria genera abundances, zooplankton abundances, water quality, and flow from Lake George, a bloom-impacted flow-through lake of the St. Johns River (FL, USA). Using the Random Forests machine learning algorithm, an assumption-free ensemble modeling approach, the dataset was evaluated to quantify and characterize relationships between environmental conditions and seven cyanobacteria groupings: five genera (Anabaena, Cylindrospermopsis, Lyngbya, Microcystis, and Oscillatoria) and two functional groups (N2 fixers and non-fixers). Results highlight the selectivity of nitrogen in describing genera and functional group dynamics, and potential for physical effects to limit the efficacy of nutrient management as a mechanism for cyanobacteria bloom mitigation.

  16. Hybrid three-dimensional and support vector machine approach for automatic vehicle tracking and classification using a single camera

    Science.gov (United States)

    Kachach, Redouane; Cañas, José María

    2016-05-01

    Using video in traffic monitoring is one of the most active research domains in the computer vision community. TrafficMonitor, a system that employs a hybrid approach for automatic vehicle tracking and classification on highways using a simple stationary calibrated camera, is presented. The proposed system consists of three modules: vehicle detection, vehicle tracking, and vehicle classification. Moving vehicles are detected by an enhanced Gaussian mixture model background estimation algorithm. The design includes a technique to resolve the occlusion problem by using a combination of two-dimensional proximity tracking algorithm and the Kanade-Lucas-Tomasi feature tracking algorithm. The last module classifies the shapes identified into five vehicle categories: motorcycle, car, van, bus, and truck by using three-dimensional templates and an algorithm based on histogram of oriented gradients and the support vector machine classifier. Several experiments have been performed using both real and simulated traffic in order to validate the system. The experiments were conducted on GRAM-RTM dataset and a proper real video dataset which is made publicly available as part of this work.

  17. Assessing the Influence of Spatio-Temporal Context for Next Place Prediction using Different Machine Learning Approaches

    Directory of Open Access Journals (Sweden)

    Jorim Urner

    2018-04-01

    Full Text Available For next place prediction, machine learning methods which incorporate contextual data are frequently used. However, previous studies often do not allow deriving generalizable methodological recommendations, since they use different datasets, methods for discretizing space, scales of prediction, prediction algorithms, and context data, and therefore lack comparability. Additionally, the cold start problem for new users is an issue. In this study, we predict next places based on one trajectory dataset but with systematically varying prediction algorithms, methods for space discretization, scales of prediction (based on a novel hierarchical approach, and incorporated context data. This allows to evaluate the relative influence of these factors on the overall prediction accuracy. Moreover, in order to tackle the cold start problem prevalent in recommender and prediction systems, we test the effect of training the predictor on all users instead of each individual one. We find that the prediction accuracy shows a varying dependency on the method of space discretization and the incorporated contextual factors at different spatial scales. Moreover, our user-independent approach reaches a prediction accuracy of around 75%, and is therefore an alternative to existing user-specific models. This research provides valuable insights into the individual and combinatory effects of model parameters and algorithms on the next place prediction accuracy. The results presented in this paper can be used to determine the influence of various contextual factors and to help researchers building more accurate prediction models. It is also a starting point for future work creating a comprehensive framework to guide the building of prediction models.

  18. Classification of Suncus murinus species complex (Soricidae: Crocidurinae) in Peninsular Malaysia using image analysis and machine learning approaches.

    Science.gov (United States)

    Abu, Arpah; Leow, Lee Kien; Ramli, Rosli; Omar, Hasmahzaiti

    2016-12-22

    Taxonomists frequently identify specimen from various populations based on the morphological characteristics and molecular data. This study looks into another invasive process in identification of house shrew (Suncus murinus) using image analysis and machine learning approaches. Thus, an automated identification system is developed to assist and simplify this task. In this study, seven descriptors namely area, convex area, major axis length, minor axis length, perimeter, equivalent diameter and extent which are based on the shape are used as features to represent digital image of skull that consists of dorsal, lateral and jaw views for each specimen. An Artificial Neural Network (ANN) is used as classifier to classify the skulls of S. murinus based on region (northern and southern populations of Peninsular Malaysia) and sex (adult male and female). Thus, specimen classification using Training data set and identification using Testing data set were performed through two stages of ANNs. At present, the classifier used has achieved an accuracy of 100% based on skulls' views. Classification and identification to regions and sexes have also attained 72.5%, 87.5% and 80.0% of accuracy for dorsal, lateral, and jaw views, respectively. This results show that the shape characteristic features used are substantial because they can differentiate the specimens based on regions and sexes up to the accuracy of 80% and above. Finally, an application was developed and can be used for the scientific community. This automated system demonstrates the practicability of using computer-assisted systems in providing interesting alternative approach for quick and easy identification of unknown species.

  19. Dynamism in a Semiconductor Industrial Machine Allocation Problem using a Hybrid of the Bio-inspired and Musical-Harmony Approach

    Science.gov (United States)

    Kalsom Yusof, Umi; Nor Akmal Khalid, Mohd

    2015-05-01

    Semiconductor industries need to constantly adjust to the rapid pace of change in the market. Most manufactured products usually have a very short life cycle. These scenarios imply the need to improve the efficiency of capacity planning, an important aspect of the machine allocation plan known for its complexity. Various studies have been performed to balance productivity and flexibility in the flexible manufacturing system (FMS). Many approaches have been developed by the researchers to determine the suitable balance between exploration (global improvement) and exploitation (local improvement). However, not much work has been focused on the domain of machine allocation problem that considers the effects of machine breakdowns. This paper develops a model to minimize the effect of machine breakdowns, thus increasing the productivity. The objectives are to minimize system unbalance and makespan as well as increase throughput while satisfying the technological constraints such as machine time availability. To examine the effectiveness of the proposed model, results for throughput, system unbalance and makespan on real industrial datasets were performed with applications of intelligence techniques, that is, a hybrid of genetic algorithm and harmony search. The result aims to obtain a feasible solution to the domain problem.

  20. Support vector machine based estimation of remaining useful life: current research status and future trends

    International Nuclear Information System (INIS)

    Huang, Hong Zhong; Wang, Hai Kun; Li, Yan Feng; Zhang, Longlong; Liu, Zhiliang

    2015-01-01

    Estimation of remaining useful life (RUL) is helpful to manage life cycles of machines and to reduce maintenance cost. Support vector machine (SVM) is a promising algorithm for estimation of RUL because it can easily process small training sets and multi-dimensional data. Many SVM based methods have been proposed to predict RUL of some key components. We did a literature review related to SVM based RUL estimation within a decade. The references reviewed are classified into two categories: improved SVM algorithms and their applications to RUL estimation. The latter category can be further divided into two types: one, to predict the condition state in the future and then build a relationship between state and RUL; two, to establish a direct relationship between current state and RUL. However, SVM is seldom used to track the degradation process and build an accurate relationship between the current health condition state and RUL. Based on the above review and summary, this paper points out that the ability to continually improve SVM, and obtain a novel idea for RUL prediction using SVM will be future works.

  1. Machine learning with R

    CERN Document Server

    Lantz, Brett

    2013-01-01

    Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or

  2. Rotating electrical machines

    CERN Document Server

    Le Doeuff, René

    2013-01-01

    In this book a general matrix-based approach to modeling electrical machines is promulgated. The model uses instantaneous quantities for key variables and enables the user to easily take into account associations between rotating machines and static converters (such as in variable speed drives).   General equations of electromechanical energy conversion are established early in the treatment of the topic and then applied to synchronous, induction and DC machines. The primary characteristics of these machines are established for steady state behavior as well as for variable speed scenarios. I

  3. Reducing Deadline Miss Rate for Grid Workloads running in Virtual Machines: a deadline-aware and adaptive approach

    CERN Document Server

    Khalid, Omer; Anthony, Richard; Petridis, Miltos

    2011-01-01

    This thesis explores three major areas of research; integration of virutalization into sci- entific grid infrastructures, evaluation of the virtualization overhead on HPC grid job’s performance, and optimization of job execution times to increase their throughput by reducing job deadline miss rate. Integration of the virtualization into the grid to deploy on-demand virtual machines for jobs in a way that is transparent to the end users and have minimum impact on the existing system poses a significant challenge. This involves the creation of virtual machines, decompression of the operating system image, adapting the virtual environ- ment to satisfy software requirements of the job, constant update of the job state once it’s running with out modifying batch system or existing grid middleware, and finally bringing the host machine back to a consistent state. To facilitate this research, an existing and in production pilot job framework has been modified to deploy virtual machines on demand on the grid using...

  4. A Novel Extreme Learning Machine Classification Model for e-Nose Application Based on the Multiple Kernel Approach.

    Science.gov (United States)

    Jian, Yulin; Huang, Daoyu; Yan, Jia; Lu, Kun; Huang, Ying; Wen, Tailai; Zeng, Tanyue; Zhong, Shijie; Xie, Qilong

    2017-06-19

    A novel classification model, named the quantum-behaved particle swarm optimization (QPSO)-based weighted multiple kernel extreme learning machine (QWMK-ELM), is proposed in this paper. Experimental validation is carried out with two different electronic nose (e-nose) datasets. Being different from the existing multiple kernel extreme learning machine (MK-ELM) algorithms, the combination coefficients of base kernels are regarded as external parameters of single-hidden layer feedforward neural networks (SLFNs). The combination coefficients of base kernels, the model parameters of each base kernel, and the regularization parameter are optimized by QPSO simultaneously before implementing the kernel extreme learning machine (KELM) with the composite kernel function. Four types of common single kernel functions (Gaussian kernel, polynomial kernel, sigmoid kernel, and wavelet kernel) are utilized to constitute different composite kernel functions. Moreover, the method is also compared with other existing classification methods: extreme learning machine (ELM), kernel extreme learning machine (KELM), k-nearest neighbors (KNN), support vector machine (SVM), multi-layer perceptron (MLP), radical basis function neural network (RBFNN), and probabilistic neural network (PNN). The results have demonstrated that the proposed QWMK-ELM outperforms the aforementioned methods, not only in precision, but also in efficiency for gas classification.

  5. A Hybrid Hierarchical Approach for Brain Tissue Segmentation by Combining Brain Atlas and Least Square Support Vector Machine

    Science.gov (United States)

    Kasiri, Keyvan; Kazemi, Kamran; Dehghani, Mohammad Javad; Helfroush, Mohammad Sadegh

    2013-01-01

    In this paper, we present a new semi-automatic brain tissue segmentation method based on a hybrid hierarchical approach that combines a brain atlas as a priori information and a least-square support vector machine (LS-SVM). The method consists of three steps. In the first two steps, the skull is removed and the cerebrospinal fluid (CSF) is extracted. These two steps are performed using the toolbox FMRIB's automated segmentation tool integrated in the FSL software (FSL-FAST) developed in Oxford Centre for functional MRI of the brain (FMRIB). Then, in the third step, the LS-SVM is used to segment grey matter (GM) and white matter (WM). The training samples for LS-SVM are selected from the registered brain atlas. The voxel intensities and spatial positions are selected as the two feature groups for training and test. SVM as a powerful discriminator is able to handle nonlinear classification problems; however, it cannot provide posterior probability. Thus, we use a sigmoid function to map the SVM output into probabilities. The proposed method is used to segment CSF, GM and WM from the simulated magnetic resonance imaging (MRI) using Brainweb MRI simulator and real data provided by Internet Brain Segmentation Repository. The semi-automatically segmented brain tissues were evaluated by comparing to the corresponding ground truth. The Dice and Jaccard similarity coefficients, sensitivity and specificity were calculated for the quantitative validation of the results. The quantitative results show that the proposed method segments brain tissues accurately with respect to corresponding ground truth. PMID:24696800

  6. Representational Learning for Fault Diagnosis of Wind Turbine Equipment: A Multi-Layered Extreme Learning Machines Approach

    Directory of Open Access Journals (Sweden)

    Zhi-Xin Yang

    2016-05-01

    Full Text Available Reliable and quick response fault diagnosis is crucial for the wind turbine generator system (WTGS to avoid unplanned interruption and to reduce the maintenance cost. However, the conditional data generated from WTGS operating in a tough environment is always dynamical and high-dimensional. To address these challenges, we propose a new fault diagnosis scheme which is composed of multiple extreme learning machines (ELM in a hierarchical structure, where a forwarding list of ELM layers is concatenated and each of them is processed independently for its corresponding role. The framework enables both representational feature learning and fault classification. The multi-layered ELM based representational learning covers functions including data preprocessing, feature extraction and dimension reduction. An ELM based autoencoder is trained to generate a hidden layer output weight matrix, which is then used to transform the input dataset into a new feature representation. Compared with the traditional feature extraction methods which may empirically wipe off some “insignificant’ feature information that in fact conveys certain undiscovered important knowledge, the introduced representational learning method could overcome the loss of information content. The computed output weight matrix projects the high dimensional input vector into a compressed and orthogonally weighted distribution. The last single layer of ELM is applied for fault classification. Unlike the greedy layer wise learning method adopted in back propagation based deep learning (DL, the proposed framework does not need iterative fine-tuning of parameters. To evaluate its experimental performance, comparison tests are carried out on a wind turbine generator simulator. The results show that the proposed diagnostic framework achieves the best performance among the compared approaches in terms of accuracy and efficiency in multiple faults detection of wind turbines.

  7. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    Directory of Open Access Journals (Sweden)

    Ashok K. Sharma

    2017-11-01

    Full Text Available The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93% and Matthews's correlation coefficient (0.84. The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87 on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84 than the multi-linear regression (MLR and partial least square regression (PLSR models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2 performed better (R2 = 0.68 in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity

  8. Disaggregation of remotely sensed soil moisture under all sky condition using machine learning approach in Northeast Asia

    Science.gov (United States)

    Kim, S.; Kim, H.; Choi, M.; Kim, K.

    2016-12-01

    Estimating spatiotemporal variation of soil moisture is crucial to hydrological applications such as flood, drought, and near real-time climate forecasting. Recent advances in space-based passive microwave measurements allow the frequent monitoring of the surface soil moisture at a global scale and downscaling approaches have been applied to improve the spatial resolution of passive microwave products available at local scale applications. However, most downscaling methods using optical and thermal dataset, are valid only in cloud-free conditions; thus renewed downscaling method under all sky condition is necessary for the establishment of spatiotemporal continuity of datasets at fine resolution. In present study Support Vector Machine (SVM) technique was utilized to downscale a satellite-based soil moisture retrievals. The 0.1 and 0.25-degree resolution of daily Land Parameter Retrieval Model (LPRM) L3 soil moisture datasets from Advanced Microwave Scanning Radiometer 2 (AMSR2) were disaggregated over Northeast Asia in 2015. Optically derived estimates of surface temperature (LST), normalized difference vegetation index (NDVI), and its cloud products were obtained from MODerate Resolution Imaging Spectroradiometer (MODIS) for the purpose of downscaling soil moisture in finer resolution under all sky condition. Furthermore, a comparison analysis between in situ and downscaled soil moisture products was also conducted for quantitatively assessing its accuracy. Results showed that downscaled soil moisture under all sky condition not only preserves the quality of AMSR2 LPRM soil moisture at 1km resolution, but also attains higher spatial data coverage. From this research we expect that time continuous monitoring of soil moisture at fine scale regardless of weather conditions would be available.

  9. A Collaborative Approach to Identifying Social Media Markers of Schizophrenia by Employing Machine Learning and Clinical Appraisals.

    Science.gov (United States)

    Birnbaum, Michael L; Ernala, Sindhu Kiranmai; Rizvi, Asra F; De Choudhury, Munmun; Kane, John M

    2017-08-14

    Linguistic analysis of publicly available Twitter feeds have achieved success in differentiating individuals who self-disclose online as having schizophrenia from healthy controls. To date, limited efforts have included expert input to evaluate the authenticity of diagnostic self-disclosures. This study aims to move from noisy self-reports of schizophrenia on social media to more accurate identification of diagnoses by exploring a human-machine partnered approach, wherein computational linguistic analysis of shared content is combined with clinical appraisals. Twitter timeline data, extracted from 671 users with self-disclosed diagnoses of schizophrenia, was appraised for authenticity by expert clinicians. Data from disclosures deemed true were used to build a classifier aiming to distinguish users with schizophrenia from healthy controls. Results from the classifier were compared to expert appraisals on new, unseen Twitter users. Significant linguistic differences were identified in the schizophrenia group including greater use of interpersonal pronouns (P<.001), decreased emphasis on friendship (P<.001), and greater emphasis on biological processes (P<.001). The resulting classifier distinguished users with disclosures of schizophrenia deemed genuine from control users with a mean accuracy of 88% using linguistic data alone. Compared to clinicians on new, unseen users, the classifier's precision, recall, and accuracy measures were 0.27, 0.77, and 0.59, respectively. These data reinforce the need for ongoing collaborations integrating expertise from multiple fields to strengthen our ability to accurately identify and effectively engage individuals with mental illness online. These collaborations are crucial to overcome some of mental illnesses' biggest challenges by using digital technology. ©Michael L Birnbaum, Sindhu Kiranmai Ernala, Asra F Rizvi, Munmun De Choudhury, John M Kane. Originally published in the Journal of Medical Internet Research (http

  10. Machine learning approaches for integrating clinical and imaging features in late-life depression classification and response prediction.

    Science.gov (United States)

    Patel, Meenal J; Andreescu, Carmen; Price, Julie C; Edelman, Kathryn L; Reynolds, Charles F; Aizenstein, Howard J

    2015-10-01

    Currently, depression diagnosis relies primarily on behavioral symptoms and signs, and treatment is guided by trial and error instead of evaluating associated underlying brain characteristics. Unlike past studies, we attempted to estimate accurate prediction models for late-life depression diagnosis and treatment response using multiple machine learning methods with inputs of multi-modal imaging and non-imaging whole brain and network-based features. Late-life depression patients (medicated post-recruitment) (n = 33) and older non-depressed individuals (n = 35) were recruited. Their demographics and cognitive ability scores were recorded, and brain characteristics were acquired using multi-modal magnetic resonance imaging pretreatment. Linear and nonlinear learning methods were tested for estimating accurate prediction models. A learning method called alternating decision trees estimated the most accurate prediction models for late-life depression diagnosis (87.27% accuracy) and treatment response (89.47% accuracy). The diagnosis model included measures of age, Mini-mental state examination score, and structural imaging (e.g. whole brain atrophy and global white mater hyperintensity burden). The treatment response model included measures of structural and functional connectivity. Combinations of multi-modal imaging and/or non-imaging measures may help better predict late-life depression diagnosis and treatment response. As a preliminary observation, we speculate that the results may also suggest that different underlying brain characteristics defined by multi-modal imaging measures-rather than region-based differences-are associated with depression versus depression recovery because to our knowledge this is the first depression study to accurately predict both using the same approach. These findings may help better understand late-life depression and identify preliminary steps toward personalized late-life depression treatment. Copyright © 2015 John Wiley

  11. Construction of Multi-Year Time-Series Profiles of Suspended Particulate Inorganic Matter Concentrations Using Machine Learning Approach

    Directory of Open Access Journals (Sweden)

    Pannimpullath R. Renosh

    2017-12-01

    Full Text Available Hydro-sedimentary numerical models have been widely employed to derive suspended particulate matter (SPM concentrations in coastal and estuarine waters. These hydro-sedimentary models are computationally and technically expensive in nature. Here we have used a computationally less-expensive, well-established methodology of self-organizing maps (SOMs along with a hidden Markov model (HMM to derive profiles of suspended particulate inorganic matter (SPIM. The concept of the proposed work is to benefit from all available data sets through the use of fusion methods and machine learning approaches that are able to process a growing amount of available data. This approach is applied to two different data sets entitled “Hidden” and “Observable”. The hidden data are composed of 15 months (27 September 2007 to 30 December 2008 of hourly SPIM profiles extracted from the Regional Ocean Modeling System (ROMS. The observable data include forcing parameter variables such as significant wave heights ( H s and H s 50 (50 days from the Wavewatch 3-HOMERE database and barotropic currents ( U b a r and V b a r from the Iberian–Biscay–Irish (IBI reanalysis data. These observable data integrate hourly surface samples from 1 February 2002 to 31 December 2012. The time-series profiles of the SPIM have been derived from four different stations in the English Channel by considering 15 months of output hidden data from the ROMS as a statistical representation of the ocean for ≈11 years. The derived SPIM profiles clearly show seasonal and tidal fluctuations in accordance with the parent numerical model output. The surface SPIM concentrations of the derived model have been validated with satellite remote sensing data. The time series of the modeled SPIM and satellite-derived SPIM show similar seasonal fluctuations. The ranges of concentrations for the four stations are also in good agreement with the corresponding satellite data. The high accuracy of the

  12. Comparing Machine Learning and Decision Making Approaches to Forecast Long Lead Monthly Rainfall: The City of Vancouver, Canada

    Directory of Open Access Journals (Sweden)

    Zahra Zahmatkesh

    2018-01-01

    Full Text Available Estimating maximum possible rainfall is of great value for flood prediction and protection, particularly for regions, such as Canada, where urban and fluvial floods from extreme rainfalls have been known to be a major concern. In this study, a methodology is proposed to forecast real-time rainfall (with one month lead time using different number of spatial inputs with different orders of lags. For this purpose, two types of models are used. The first one is a machine learning data driven-based model, which uses a set of hydrologic variables as inputs, and the second one is an empirical-statistical model that employs the multi-criteria decision analysis method for rainfall forecasting. The data driven model is built based on Artificial Neural Networks (ANNs, while the developed multi-criteria decision analysis model uses Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS approach. A comprehensive set of spatially varying climate variables, including geopotential height, sea surface temperature, sea level pressure, humidity, temperature and pressure with different orders of lags is collected to form input vectors for the forecast models. Then, a feature selection method is employed to identify the most appropriate predictors. Two sets of results from the developed models, i.e., maximum daily rainfall in each month (RMAX and cumulative value of rainfall for each month (RCU, are considered as the target variables for forecast purpose. The results from both modeling approaches are compared using a number of evaluation criteria such as Nash-Sutcliffe Efficiency (NSE. The proposed models are applied for rainfall forecasting for a coastal area in Western Canada: Vancouver, British Columbia. Results indicate although data driven models such as ANNs work well for the simulation purpose, developed TOPSIS model considerably outperforms ANNs for the rainfall forecasting. ANNs show acceptable simulation performance during the

  13. Enter the machine

    Science.gov (United States)

    Palittapongarnpim, Pantita; Sanders, Barry C.

    2018-05-01

    Quantum tomography infers quantum states from measurement data, but it becomes infeasible for large systems. Machine learning enables tomography of highly entangled many-body states and suggests a new powerful approach to this problem.

  14. Application of grey-fuzzy approach in parametric optimization of EDM process in machining of MDN 300 steel

    Science.gov (United States)

    Protim Das, Partha; Gupta, P.; Das, S.; Pradhan, B. B.; Chakraborty, S.

    2018-01-01

    Maraging steel (MDN 300) find its application in many industries as it exhibits high hardness which are very difficult to machine material. Electro discharge machining (EDM) is an extensively popular machining process which can be used in machining of such materials. Optimization of response parameters are essential for effective machining of these materials. Past researchers have already used Taguchi for obtaining the optimal responses of EDM process for this material with responses such as material removal rate (MRR), tool wear rate (TWR), relative wear ratio (RWR), and surface roughness (SR) considering discharge current, pulse on time, pulse off time, arc gap, and duty cycle as process parameters. In this paper, grey relation analysis (GRA) with fuzzy logic is applied to this multi objective optimization problem to check the responses by an implementation of the derived parametric setting. It was found that the parametric setting derived by the proposed method results in better a response than those reported by the past researchers. Obtained results are also verified using the technique for order of preference by similarity to ideal solution (TOPSIS). The predicted result also shows that there is a significant improvement in comparison to the results of past researchers.

  15. Sustainable machining

    CERN Document Server

    2017-01-01

    This book provides an overview on current sustainable machining. Its chapters cover the concept in economic, social and environmental dimensions. It provides the reader with proper ways to handle several pollutants produced during the machining process. The book is useful on both undergraduate and postgraduate levels and it is of interest to all those working with manufacturing and machining technology.

  16. What subject matter questions motivate the use of machine learning approaches compared to statistical models for probability prediction?

    Science.gov (United States)

    Binder, Harald

    2014-07-01

    This is a discussion of the following papers: "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory" by Jochen Kruppa, Yufeng Liu, Gérard Biau, Michael Kohler, Inke R. König, James D. Malley, and Andreas Ziegler; and "Probability estimation with machine learning methods for dichotomous and multicategory outcome: Applications" by Jochen Kruppa, Yufeng Liu, Hans-Christian Diener, Theresa Holste, Christian Weimar, Inke R. König, and Andreas Ziegler. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Multidsciplinary Approaches to Coastal Adaptation - Aplying Machine Learning Techniques to assess coastal risk in Latin America and The Caribbean

    Science.gov (United States)

    Calil, J.

    2016-12-01

    The global population, currently at 7.3 billion, is increasing by nearly 230,000 people every day. As the world's population grows to an estimated 11.2 billion by 2100, the number of people living in low elevation areas, exposed to coastal hazards, is continuing to increase. In 2013, 22 million people were displaced by extreme weather events, with 37 events displacing at least 100,000 people each. Losses from natural disasters and disaster risk are determined by a complex interaction between physical hazards and the vulnerability of a society or social-ecological system, and its exposure to such hazards. Impacts from coastal hazards depend on the number of people, value of assets, and presence of critical resources in harm's way. Moreover, coastal risks are amplified by challenging socioeconomic dynamics, including ill-advised urban development, income inequality, and poverty level. Our results demonstrate that in Latin America and the Caribbean (LAC), more than half a million people live in areas where coastal hazards, exposure (of people, assets and ecosystems), and poverty converge, creating the ideal conditions for a perfect storm. In order to identify the population at greatest risk to coastal hazards in LAC, and in response to a growing demand for multidisciplinary coastal adaptation approaches, this study employs a combination of machine learning clustering techniques (K-Means and Self Organizing Maps), and a spatial index, to assess coastal risks on a comparative scale. Data for more than 13,000 coastal locations in LAC were collected and allocated into three categories: (1) Coastal Hazards (including storm surge, wave energy and El Niño); (2) Geographic Exposure (including population, agriculture, and ecosystems); and (3) Vulnerability (including income inequality, infant mortality rate and malnutrition). This study identified hotspots of coastal vulnerability, the key drivers of coastal risk at each geographic location. Our results provide important

  18. Student Modeling and Machine Learning

    OpenAIRE

    Sison , Raymund; Shimura , Masamichi

    1998-01-01

    After identifying essential student modeling issues and machine learning approaches, this paper examines how machine learning techniques have been used to automate the construction of student models as well as the background knowledge necessary for student modeling. In the process, the paper sheds light on the difficulty, suitability and potential of using machine learning for student modeling processes, and, to a lesser extent, the potential of using student modeling techniques in machine le...

  19. Research on bearing life prediction based on support vector machine and its application

    International Nuclear Information System (INIS)

    Sun Chuang; Zhang Zhousuo; He Zhengjia

    2011-01-01

    Life prediction of rolling element bearing is the urgent demand in engineering practice, and the effective life prediction technique is beneficial to predictive maintenance. Support vector machine (SVM) is a novel machine learning method based on statistical learning theory, and is of advantage in prediction. This paper develops SVM-based model for bearing life prediction. The inputs of the model are features of bearing vibration signal and the output is the bearing running time-bearing failure time ratio. The model is built base on a few failed bearing data, and it can fuse information of the predicted bearing. So it is of advantage to bearing life prediction in practice. The model is applied to life prediction of a bearing, and the result shows the proposed model is of high precision.

  20. A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range Data.

    Science.gov (United States)

    Navarro, Pedro J; Fernández, Carlos; Borraz, Raúl; Alonso, Diego

    2016-12-23

    This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution) is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN), Naïve Bayes classifier (NBC), and Support Vector Machine (SVM). These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%), accuracy (96.2%) and specificity (96.8%).

  1. A new optimization approach for nozzle selection and component allocation in multi-head beam-type SMD placement machines

    NARCIS (Netherlands)

    Torabi, S.A.; Hamedi, M.; Ashayeri, J.

    2013-01-01

    This paper addresses a highly challenging scheduling problem faced in multi-head beam-type surface mounting devices (SMD) machines. An integrated mathematical model is formulated aiming to balance workloads over multiple heads as well as improving the traveling speed of the robotic arm by

  2. A Machine Learning Approach to Pedestrian Detection for Autonomous Vehicles Using High-Definition 3D Range Data

    Directory of Open Access Journals (Sweden)

    Pedro J. Navarro

    2016-12-01

    Full Text Available This article describes an automated sensor-based system to detect pedestrians in an autonomous vehicle application. Although the vehicle is equipped with a broad set of sensors, the article focuses on the processing of the information generated by a Velodyne HDL-64E LIDAR sensor. The cloud of points generated by the sensor (more than 1 million points per revolution is processed to detect pedestrians, by selecting cubic shapes and applying machine vision and machine learning algorithms to the XY, XZ, and YZ projections of the points contained in the cube. The work relates an exhaustive analysis of the performance of three different machine learning algorithms: k-Nearest Neighbours (kNN, Naïve Bayes classifier (NBC, and Support Vector Machine (SVM. These algorithms have been trained with 1931 samples. The final performance of the method, measured a real traffic scenery, which contained 16 pedestrians and 469 samples of non-pedestrians, shows sensitivity (81.2%, accuracy (96.2% and specificity (96.8%.

  3. Flanking region sequence information to refine microRNA target ...

    Indian Academy of Sciences (India)

    Prakash

    (SVM)-based target prediction refinement approach has been introduced through .... are kernel-based statistical learning machines, where a discriminant ...... Cox T and Cuff J 2002 The Ensembl genome database project;. Nucleic Acids Res.

  4. Introduction to machine learning.

    Science.gov (United States)

    Baştanlar, Yalin; Ozuysal, Mustafa

    2014-01-01

    The machine learning field, which can be briefly defined as enabling computers make successful predictions using past experiences, has exhibited an impressive development recently with the help of the rapid increase in the storage capacity and processing power of computers. Together with many other disciplines, machine learning methods have been widely employed in bioinformatics. The difficulties and cost of biological analyses have led to the development of sophisticated machine learning approaches for this application area. In this chapter, we first review the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification. Then, we point out the main issues of designing machine learning experiments and their performance evaluation. Finally, we introduce some supervised learning methods.

  5. Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting

    Science.gov (United States)

    Yu, Pao-Shan; Yang, Tao-Chang; Chen, Szu-Yin; Kuo, Chen-Min; Tseng, Hung-Wei

    2017-09-01

    This study aims to compare two machine learning techniques, random forests (RF) and support vector machine (SVM), for real-time radar-derived rainfall forecasting. The real-time radar-derived rainfall forecasting models use the present grid-based radar-derived rainfall as the output variable and use antecedent grid-based radar-derived rainfall, grid position (longitude and latitude) and elevation as the input variables to forecast 1- to 3-h ahead rainfalls for all grids in a catchment. Grid-based radar-derived rainfalls of six typhoon events during 2012-2015 in three reservoir catchments of Taiwan are collected for model training and verifying. Two kinds of forecasting models are constructed and compared, which are single-mode forecasting model (SMFM) and multiple-mode forecasting model (MMFM) based on RF and SVM. The SMFM uses the same model for 1- to 3-h ahead rainfall forecasting; the MMFM uses three different models for 1- to 3-h ahead forecasting. According to forecasting performances, it reveals that the SMFMs give better performances than MMFMs and both SVM-based and RF-based SMFMs show satisfactory performances for 1-h ahead forecasting. However, for 2- and 3-h ahead forecasting, it is found that the RF-based SMFM underestimates the observed radar-derived rainfalls in most cases and the SVM-based SMFM can give better performances than RF-based SMFM.

  6. Daily River Flow Forecasting with Hybrid Support Vector Machine – Particle Swarm Optimization

    Science.gov (United States)

    Zaini, N.; Malek, M. A.; Yusoff, M.; Mardi, N. H.; Norhisham, S.

    2018-04-01

    The application of artificial intelligence techniques for river flow forecasting can further improve the management of water resources and flood prevention. This study concerns the development of support vector machine (SVM) based model and its hybridization with particle swarm optimization (PSO) to forecast short term daily river flow at Upper Bertam Catchment located in Cameron Highland, Malaysia. Ten years duration of historical rainfall, antecedent river flow data and various meteorology parameters data from 2003 to 2012 are used in this study. Four SVM based models are proposed which are SVM1, SVM2, SVM-PSO1 and SVM-PSO2 to forecast 1 to 7 day ahead of river flow. SVM1 and SVM-PSO1 are the models with historical rainfall and antecedent river flow as its input, while SVM2 and SVM-PSO2 are the models with historical rainfall, antecedent river flow data and additional meteorological parameters as input. The performances of the proposed model are measured in term of RMSE and R2 . It is found that, SVM2 outperformed SVM1 and SVM-PSO2 outperformed SVM-PSO1 which meant the additional meteorology parameters used as input to the proposed models significantly affect the model performances. Hybrid models SVM-PSO1 and SVM-PSO2 yield higher performances as compared to SVM1 and SVM2. It is found that hybrid models are more effective in forecasting river flow at 1 to 7 day ahead at the study area.

  7. Improved Reliability-Based Optimization with Support Vector Machines and Its Application in Aircraft Wing Design

    Directory of Open Access Journals (Sweden)

    Yu Wang

    2015-01-01

    Full Text Available A new reliability-based design optimization (RBDO method based on support vector machines (SVM and the Most Probable Point (MPP is proposed in this work. SVM is used to create a surrogate model of the limit-state function at the MPP with the gradient information in the reliability analysis. This guarantees that the surrogate model not only passes through the MPP but also is tangent to the limit-state function at the MPP. Then, importance sampling (IS is used to calculate the probability of failure based on the surrogate model. This treatment significantly improves the accuracy of reliability analysis. For RBDO, the Sequential Optimization and Reliability Assessment (SORA is employed as well, which decouples deterministic optimization from the reliability analysis. The improved SVM-based reliability analysis is used to amend the error from linear approximation for limit-state function in SORA. A mathematical example and a simplified aircraft wing design demonstrate that the improved SVM-based reliability analysis is more accurate than FORM and needs less training points than the Monte Carlo simulation and that the proposed optimization strategy is efficient.

  8. Choosing between different AI approaches? The scientific benefits of the confrontation, and the new collaborative era between humans and machines

    Directory of Open Access Journals (Sweden)

    Jordi Vallverdú

    2008-07-01

    Full Text Available AI is a multidisciplinary activity that involves specialists from several fields, and we can say that the aim of science, and AI science, is solving problems. AI and computer sciences are been creating a new kind of making science, that we can call in silico science. Both models top-eown and bottomup are useful for e-scientific research. There is no a real controversy between them. Besides, the extended mind model of human cognition, involves human-machine interactions. Huge amount of data requires new ways to make and organize scientific practices: supercomputers, grids, distributed computing, specific software and middleware and, basically, more efficient and visual ways to interact with information. This is one of the key points to understand contemporary relationships between humans and machines: usability of scientific data.

  9. Agustin de Betancourt’s Wind Machine for Draining Marshy Ground: Approach to Its Geometric Modeling with Autodesk Inventor Professional

    Directory of Open Access Journals (Sweden)

    José Ignacio Rojas-Sola

    2016-12-01

    Full Text Available The present study shows the process followed in making the three-dimensional model and geometric documentation of a historical invention of the renowned Spanish engineer Agustin de Betancourt y Molina, which forms part of his rich legacy. Specifically, this was a wind machine for draining marshy ground, designed in 1789. The present research relies on the computer-aided design (CAD techniques using Autodesk Inventor Professional software, based on the scant information provided by the only two drawings of the machine, making it necessary to propose a number of dimensional and geometric hypotheses as well as a series of movement restrictions (degrees of freedom, to arrive at a consistent design. The results offer a functional design for this historic invention.

  10. OPTIMIZATION OF MACHINING PARAMETERS USING TAGUCHI APPROACH DURING HARD TURNING OF ALLOY STEEL WITH UNCOATED CARBIDE UNDER DRY CUTTING ENVIRONMENT

    Directory of Open Access Journals (Sweden)

    A. Das

    2015-12-01

    Full Text Available In today’s world of manufacturing by machining process two things are very important, one is productivity and the other one is quality. Quality of a product generally depends upon the surface finish and dimensional deviations. The productivity can be seen as a key economic indicator of innovation in terms of higher material removal rate with a less time and cost in machining industries. Taguchi method is a popular statistical technique for optimization of input parameters to get the best output results. Dry machining is a popular methodology for machining hard material and it has been accepted by many researchers to a great extent because of its low cost and safety. Many scientists have taken various input parameters and studied their effects on different output responses. In the present paper an attempt has been made to study the effect of input parameters such as cutting speed, feed rate and depth of cut on Surface roughness, Tool wear, Power consumption and Chip reduction co-efficient under dry condition using uncoated carbide insert. Signal to noise ratio has been used to select the optimal condition for various output responses. ANOVA table has been drawn for each output responses and finally mathematical model of multiple regression analysis has been prepared and authenticity of the statistical model have been checked by normal probability plot. It has been found from the experimental result that the power consumption and flank wear both were minimum at the cutting speed of 250 rpm and 400 rpm respectively. Chip reduction coefficient has been found minimum at a depth of cut of 0.3 mm and surface roughness was minimum at 0.1 mm/rev. feed rate.

  11. Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach

    OpenAIRE

    Yamamoto, Yoichiro; Saito, Akira; Tateishi, Ayako; Shimojo, Hisashi; Kanno, Hiroyuki; Tsuchiya, Shinichi; Ito, Ken-ichi; Cosatto, Eric; Graf, Hans Peter; Moraleda, Rodrigo R.; Eils, Roland; Grabe, Niels

    2017-01-01

    Machine learning systems have recently received increased attention for their broad applications in several fields. In this study, we show for the first time that histological types of breast tumors can be classified using subtle morphological differences of microenvironmental myoepithelial cell nuclei without any direct information about neoplastic tumor cells. We quantitatively measured 11661 nuclei on the four histological types: normal cases, usual ductal hyperplasia and low/high grade du...

  12. A systematic approach to the application of Automation, Robotics, and Machine Intelligence Systems /ARAMIS/ to future space projects

    Science.gov (United States)

    Smith, D. B. S.

    1982-01-01

    The potential applications of Automation, Robotics, and Machine Intelligence Systems (ARAMIS) to space projects are investigated, through a systematic method. In this method selected space projects are broken down into space project tasks, and 69 of these tasks are selected for study. Candidate ARAMIS options are defined for each task. The relative merits of these options are evaluated according to seven indices of performance. Logical sequences of ARAMIS development are also defined. Based on this data, promising applications of ARAMIS are

  13. A machine-learning approach for damage detection in aircraft structures using self-powered sensor data

    Science.gov (United States)

    Salehi, Hadi; Das, Saptarshi; Chakrabartty, Shantanu; Biswas, Subir; Burgueño, Rigoberto

    2017-04-01

    This study proposes a novel strategy for damage identification in aircraft structures. The strategy was evaluated based on the simulation of the binary data generated from self-powered wireless sensors employing a pulse switching architecture. The energy-aware pulse switching communication protocol uses single pulses instead of multi-bit packets for information delivery resulting in discrete binary data. A system employing this energy-efficient technology requires dealing with time-delayed binary data due to the management of power budgets for sensing and communication. This paper presents an intelligent machine-learning framework based on combination of the low-rank matrix decomposition and pattern recognition (PR) methods. Further, data fusion is employed as part of the machine-learning framework to take into account the effect of data time delay on its interpretation. Simulated time-delayed binary data from self-powered sensors was used to determine damage indicator variables. Performance and accuracy of the damage detection strategy was examined and tested for the case of an aircraft horizontal stabilizer. Damage states were simulated on a finite element model by reducing stiffness in a region of the stabilizer's skin. The proposed strategy shows satisfactory performance to identify the presence and location of the damage, even with noisy and incomplete data. It is concluded that PR is a promising machine-learning algorithm for damage detection for time-delayed binary data from novel self-powered wireless sensors.

  14. Machine Learning-Assisted Network Inference Approach to Identify a New Class of Genes that Coordinate the Functionality of Cancer Networks.

    Science.gov (United States)

    Ghanat Bari, Mehrab; Ung, Choong Yong; Zhang, Cheng; Zhu, Shizhen; Li, Hu

    2017-08-01

    Emerging evidence indicates the existence of a new class of cancer genes that act as "signal linkers" coordinating oncogenic signals between mutated and differentially expressed genes. While frequently mutated oncogenes and differentially expressed genes, which we term Class I cancer genes, are readily detected by most analytical tools, the new class of cancer-related genes, i.e., Class II, escape detection because they are neither mutated nor differentially expressed. Given this hypothesis, we developed a Machine Learning-Assisted Network Inference (MALANI) algorithm, which assesses all genes regardless of expression or mutational status in the context of cancer etiology. We used 8807 expression arrays, corresponding to 9 cancer types, to build more than 2 × 10 8 Support Vector Machine (SVM) models for reconstructing a cancer network. We found that ~3% of ~19,000 not differentially expressed genes are Class II cancer gene candidates. Some Class II genes that we found, such as SLC19A1 and ATAD3B, have been recently reported to associate with cancer outcomes. To our knowledge, this is the first study that utilizes both machine learning and network biology approaches to uncover Class II cancer genes in coordinating functionality in cancer networks and will illuminate our understanding of how genes are modulated in a tissue-specific network contribute to tumorigenesis and therapy development.

  15. Simple machines

    CERN Document Server

    Graybill, George

    2007-01-01

    Just how simple are simple machines? With our ready-to-use resource, they are simple to teach and easy to learn! Chocked full of information and activities, we begin with a look at force, motion and work, and examples of simple machines in daily life are given. With this background, we move on to different kinds of simple machines including: Levers, Inclined Planes, Wedges, Screws, Pulleys, and Wheels and Axles. An exploration of some compound machines follows, such as the can opener. Our resource is a real time-saver as all the reading passages, student activities are provided. Presented in s

  16. Support vector machine learning model for the prediction of sentinel node status in patients with cutaneous melanoma.

    Science.gov (United States)

    Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo

    2006-08-01

    Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.

  17. Systematic approach to the application of automation, robotics, and machine intelligence systems (aramis) to future space projects

    Energy Technology Data Exchange (ETDEWEB)

    Smith, D B.S.

    1983-01-01

    The potential applications of automation, robotics and machine intelligence systems (ARAMIS) to space projects are investigated, through a systematic method. In this method selected space projects are broken down into space project tasks, and 69 of these tasks are selected for study. Candidate ARAMIS options are defined for each task. The relative merits of these options are evaluated according to seven indices of performance. Logical sequences of ARAMIS development are also defined. Based on this data, promising applications of ARAMIS are identified for space project tasks. General conclusions and recommendations for further study are also presented. 6 references.

  18. Face machines

    Energy Technology Data Exchange (ETDEWEB)

    Hindle, D.

    1999-06-01

    The article surveys latest equipment available from the world`s manufacturers of a range of machines for tunnelling. These are grouped under headings: excavators; impact hammers; road headers; and shields and tunnel boring machines. Products of thirty manufacturers are referred to. Addresses and fax numbers of companies are supplied. 5 tabs., 13 photos.

  19. Electric machine

    Science.gov (United States)

    El-Refaie, Ayman Mohamed Fawzi [Niskayuna, NY; Reddy, Patel Bhageerath [Madison, WI

    2012-07-17

    An interior permanent magnet electric machine is disclosed. The interior permanent magnet electric machine comprises a rotor comprising a plurality of radially placed magnets each having a proximal end and a distal end, wherein each magnet comprises a plurality of magnetic segments and at least one magnetic segment towards the distal end comprises a high resistivity magnetic material.

  20. Machine Learning.

    Science.gov (United States)

    Kirrane, Diane E.

    1990-01-01

    As scientists seek to develop machines that can "learn," that is, solve problems by imitating the human brain, a gold mine of information on the processes of human learning is being discovered, expert systems are being improved, and human-machine interactions are being enhanced. (SK)

  1. Nonplanar machines

    International Nuclear Information System (INIS)

    Ritson, D.

    1989-05-01

    This talk examines methods available to minimize, but never entirely eliminate, degradation of machine performance caused by terrain following. Breaking of planar machine symmetry for engineering convenience and/or monetary savings must be balanced against small performance degradation, and can only be decided on a case-by-case basis. 5 refs

  2. Addiction Machines

    Directory of Open Access Journals (Sweden)

    James Godley

    2011-10-01

    Full Text Available Entry into the crypt William Burroughs shared with his mother opened and shut around a failed re-enactment of William Tell’s shot through the prop placed upon a loved one’s head. The accidental killing of his wife Joan completed the installation of the addictation machine that spun melancholia as manic dissemination. An early encryptment to which was added the audio portion of abuse deposited an undeliverable message in WB. Wil- liam could never tell, although his corpus bears the in- scription of this impossibility as another form of pos- sibility. James Godley is currently a doctoral candidate in Eng- lish at SUNY Buffalo, where he studies psychoanalysis, Continental philosophy, and nineteenth-century litera- ture and poetry (British and American. His work on the concept of mourning and “the dead” in Freudian and Lacanian approaches to psychoanalytic thought and in Gothic literature has also spawned an essay on zombie porn. Since entering the Academy of Fine Arts Karlsruhe in 2007, Valentin Hennig has studied in the classes of Sil- via Bächli, Claudio Moser, and Corinne Wasmuht. In 2010 he spent a semester at the Dresden Academy of Fine Arts. His work has been shown in group exhibi- tions in Freiburg and Karlsruhe.

  3. Adapting machine learning techniques to censored time-to-event health record data: A general-purpose approach using inverse probability of censoring weighting.

    Science.gov (United States)

    Vock, David M; Wolfson, Julian; Bandyopadhyay, Sunayan; Adomavicius, Gediminas; Johnson, Paul E; Vazquez-Benitez, Gabriela; O'Connor, Patrick J

    2016-06-01

    Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting information from administrative and clinical databases, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions; this loss to follow-up is often due to disenrollment from the health system. For subjects without complete follow-up, whether or not they experienced the adverse event is unknown, and in statistical terms the event time is said to be right-censored. Most machine learning approaches to the problem have been relatively ad hoc; for example, common approaches for handling observations in which the event status is unknown include (1) discarding those observations, (2) treating them as non-events, (3) splitting those observations into two observations: one where the event occurs and one where the event does not. In this paper, we present a general-purpose approach to account for right-censored outcomes using inverse probability of censoring weighting (IPCW). We illustrate how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. We then show that our approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event, using EHD from a large U.S. Midwestern healthcare system. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach.

    Science.gov (United States)

    Passos, Ives Cavalcante; Mwangi, Benson; Cao, Bo; Hamilton, Jane E; Wu, Mon-Ju; Zhang, Xiang Yang; Zunta-Soares, Giovana B; Quevedo, Joao; Kauer-Sant'Anna, Marcia; Kapczinski, Flávio; Soares, Jair C

    2016-03-15

    A growing body of evidence has put forward clinical risk factors associated with patients with mood disorders that attempt suicide. However, what is not known is how to integrate clinical variables into a clinically useful tool in order to estimate the probability of an individual patient attempting suicide. A total of 144 patients with mood disorders were included. Clinical variables associated with suicide attempts among patients with mood disorders and demographic variables were used to 'train' a machine learning algorithm. The resulting algorithm was utilized in identifying novel or 'unseen' individual subjects as either suicide attempters or non-attempters. Three machine learning algorithms were implemented and evaluated. All algorithms distinguished individual suicide attempters from non-attempters with prediction accuracy ranging between 65% and 72% (pdisorder (PTSD) comorbidity. Risk for suicide attempt among patients with mood disorders can be estimated at an individual subject level by incorporating both demographic and clinical variables. Future studies should examine the performance of this model in other populations and its subsequent utility in facilitating selection of interventions to prevent suicide. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Numerical simulation of machining distortions on a forged aerospace component following a one and a multi-step approaches

    Science.gov (United States)

    Prete, Antonio Del; Franchi, Rodolfo; Antermite, Fabrizio; Donatiello, Iolanda

    2018-05-01

    Residual stresses appear in a component as a consequence of thermo-mechanical processes (e.g. ring rolling process) casting and heat treatments. When machining these kinds of components, distortions arise due to the redistribution of residual stresses due to the foregoing process history inside the material. If distortions are excessive, they can lead to a large number of scrap parts. Since dimensional accuracy can affect directly the engines efficiency, the dimensional control for aerospace components is a non-trivial issue. In this paper, the problem related to the distortions of large thin walled aeroengines components in nickel superalloys has been addressed. In order to estimate distortions on inner diameters after internal turning operations, a 3D Finite Element Method (FEM) analysis has been developed on a real industrial test case. All the process history, has been taken into account by developing FEM models of ring rolling process and heat treatments. Three different strategies of ring rolling process have been studied and the combination of related parameters which allows to obtain the best dimensional accuracy has been found. Furthermore, grain size evolution and recrystallization phenomena during manufacturing process has been numerically investigated using a semi empirical Johnson-Mehl-Avrami-Kohnogorov (JMAK) model. The volume subtractions have been simulated by boolean trimming: a one step and a multi step analysis have been performed. The multi-step procedure has allowed to choose the best material removal sequence in order to reduce machining distortions.

  6. An Approach to Integrating Tactical Decision-Making in Industrial Maintenance Balance Scorecards Using Principal Components Analysis and Machine Learning

    Directory of Open Access Journals (Sweden)

    Néstor Rodríguez-Padial

    2017-01-01

    Full Text Available The uncertainty of demand has led production systems to become increasingly complex; this can affect the availability of the machines and thus their maintenance. Therefore, it is necessary to adequately manage the information that facilitates decision-making. This paper presents a system for making decisions related to the design of customized maintenance plans in a production plant. This paper addresses this tactical goal and aims to provide greater knowledge and better predictions by projecting reliable behavior in the medium-term, integrating this new functionality into classic Balance Scorecards, and making it possible to extend their current measuring function to a new aptitude: predicting evolution based on historical data. In the proposed Custom Balance Scorecard design, an exploratory data phase is integrated with another analysis and prediction phase using Principal Component Analysis algorithms and Machine Learning that uses Artificial Neural Network algorithms. This new extension allows better control over the maintenance function of an industrial plant in the medium-term with a yearly horizon taken over monthly intervals which allows the measurement of the indicators of strategic productive areas and the discovery of hidden behavior patterns in work orders. In addition, this extension enables the prediction of indicator outcomes such as overall equipment efficiency and mean time to failure.

  7. Accurate torque-sensorless control approach for interior permanent-magnet synchronous machine based on cascaded sliding mode observer

    Directory of Open Access Journals (Sweden)

    Kai-Hui Zhao

    2017-06-01

    Full Text Available To improve the accuracy of torque control for vector control of interior permanent-magnet synchronous machine (IPMSM, this study proposes a torque-sensorless control method based on cascaded sliding mode observer (SMO. First, the active flux model is discussed, which converts the model of IPMSM into the equivalent model of surface-mounted permanent-magnet synchronous machine. Second, to reduce chattering caused by system parameters variations and external disturbances, the cascaded observer is designed, which is composed of a variable gain adaptive SMO and an active flux SMO. The variable gain adaptive SMO is designed to estimate the speed, rotor position and stator resistance in the d–q reference frame. The active flux SMO is designed to estimate the active flux and torque in the α–β reference frame. Global asymptotic stability of the observers is guaranteed by the Lyapunov stability analysis. Finally, simulations and experiments are carried out to verify the effectiveness of the proposed control scheme.

  8. A framework for evaluating the performance of automated teller machine in banking industries: A queuing model-cum-TOPSIS approach

    Directory of Open Access Journals (Sweden)

    Christopher Osita Anyaeche

    2018-04-01

    Full Text Available The improvement in the provision of banking services to customers enhances bank’s performance (profitability and productivity and the amounts of dividend declared to shareholders as well as bank’s competitiveness. One means of fast tracking the service time for bank customers is through the use of self-servicing machines, such as automated teller machine (ATM. Total service cost, expected waiting time in queue, ATM utilization and percentage of customer loss are some of the performance indices that are used to evaluate the service rendered by a bank’s ATM. This study proposes a framework for evaluating the performance of ATM by integrating queuing model and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS methodology. Applicability of the framework was tested using practical data obtained from four banks in Nigeria. It was observed that the average ATM usage in the study area was less than 50%. The TOPSIS results identified Bank A as the best ranked bank. In addition, the results obtained revealed that banks with two ATM were ranked higher than banks with more than two ATM

  9. Monitoring mangrove biomass change in Vietnam using SPOT images and an object-based approach combined with machine learning algorithms

    Science.gov (United States)

    Pham, Lien T. H.; Brabyn, Lars

    2017-06-01

    Mangrove forests are well-known for their provision of ecosystem services and capacity to reduce carbon dioxide concentrations in the atmosphere. Mapping and quantifying mangrove biomass is useful for the effective management of these forests and maximizing their ecosystem service performance. The objectives of this research were to model, map, and analyse the biomass change between 2000 and 2011 of mangrove forests in the Cangio region in Vietnam. SPOT 4 and 5 images were used in conjunction with object-based image analysis and machine learning algorithms. The study area included natural and planted mangroves of diverse species. After image preparation, three different mangrove associations were identified using two levels of image segmentation followed by a Support Vector Machine classifier and a range of spectral, texture and GIS information for classification. The overall classification accuracy for the 2000 and 2011 images were 77.1% and 82.9%, respectively. Random Forest regression algorithms were then used for modelling and mapping biomass. The model that integrated spectral, vegetation association type, texture, and vegetation indices obtained the highest accuracy (R2adj = 0.73). Among the different variables, vegetation association type was the most important variable identified by the Random Forest model. Based on the biomass maps generated from the Random Forest, total biomass in the Cangio mangrove forest increased by 820,136 tons over this period, although this change varied between the three different mangrove associations.

  10. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

    Science.gov (United States)

    Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

    2016-01-01

    The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and

  11. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data

    Directory of Open Access Journals (Sweden)

    Mitchell Pesesky

    2016-11-01

    Full Text Available The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitate initial use of empiric (frequently broad-spectrum antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0% and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance

  12. Design local exhaust ventilation on sieve machine at PT.Perkebunan Nusantara VIII Ciater using design for assembly (DFA) approach with Boothroyd and Dewhurst method

    Science.gov (United States)

    Khalqihi, K. I.; Rahayu, M.; Rendra, M.

    2017-12-01

    PT Perkebunan Nusantara VIII Ciater is a company produced black tea orthodox more or less 4 tons every day. At the production section, PT Perkebunan Nusantara VIII will use local exhaust ventilation specially at sortation area on sieve machine. To maintain the quality of the black tea orthodox, all machine must be scheduled for maintenance every once a month and takes time 2 hours in workhours, with additional local exhaust ventilation, it will increase time for maintenance process, if maintenance takes time more than 2 hours it will caused production process delayed. To support maintenance process in PT Perkebunan Nusantara VIII Ciater, designing local exhaust ventilation using design for assembly approach with Boothroyd and Dewhurst method, design for assembly approach is choosen to simplify maintenance process which required assembly process. There are 2 LEV designs for this research. Design 1 with 94 components, assembly time 647.88 seconds and assembly efficiency level 23.62%. Design 2 with 82 components, assembly time 567.84 seconds and assembly efficiency level 24.83%. Design 2 is choosen for this research based on DFA goals, minimum total part that use, optimization assembly time, and assembly efficiency level.

  13. The Machine within the Machine

    CERN Multimedia

    Katarina Anthony

    2014-01-01

    Although Virtual Machines are widespread across CERN, you probably won't have heard of them unless you work for an experiment. Virtual machines - known as VMs - allow you to create a separate machine within your own, allowing you to run Linux on your Mac, or Windows on your Linux - whatever combination you need.   Using a CERN Virtual Machine, a Linux analysis software runs on a Macbook. When it comes to LHC data, one of the primary issues collaborations face is the diversity of computing environments among collaborators spread across the world. What if an institute cannot run the analysis software because they use different operating systems? "That's where the CernVM project comes in," says Gerardo Ganis, PH-SFT staff member and leader of the CernVM project. "We were able to respond to experimentalists' concerns by providing a virtual machine package that could be used to run experiment software. This way, no matter what hardware they have ...

  14. Machine translation

    Energy Technology Data Exchange (ETDEWEB)

    Nagao, M

    1982-04-01

    Each language has its own structure. In translating one language into another one, language attributes and grammatical interpretation must be defined in an unambiguous form. In order to parse a sentence, it is necessary to recognize its structure. A so-called context-free grammar can help in this respect for machine translation and machine-aided translation. Problems to be solved in studying machine translation are taken up in the paper, which discusses subjects for semantics and for syntactic analysis and translation software. 14 references.

  15. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    Directory of Open Access Journals (Sweden)

    Suduan Chen

    2014-01-01

    Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  16. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    Science.gov (United States)

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  17. Micro-EDM process modeling and machining approaches for minimum tool electrode wear for fabrication of biocompatible micro-components

    DEFF Research Database (Denmark)

    Puthumana, Govindan

    2017-01-01

    Micro-electrical discharge machining (micro-EDM) is a potential non-contact method for fabrication of biocompatible micro devices. This paper presents an attempt to model the tool electrode wear in micro-EDM process using multiple linear regression analysis (MLRA) and artificial neural networks...... linear regression model was developed for prediction of TWR in ten steps at a significance level of 90%. The optimum architecture of the ANN was obtained with 7 hidden layers at an R-sq value of 0.98. The predicted values of TWR using ANN matched well with the practically measured and calculated values...... (ANN). The governing micro-EDM factors chosen for this investigation were: voltage (V), current (I), pulse on time (Ton) and pulse frequency (f). The proposed predictive models generate a functional correlation between the tool electrode wear rate (TWR) and the governing micro-EDM factors. A multiple...

  18. Comparison of Different Machine Learning Approaches for Monthly Satellite-Based Soil Moisture Downscaling over Northeast China

    Directory of Open Access Journals (Sweden)

    Yangxiaoyue Liu

    2017-12-01

    Full Text Available Although numerous satellite-based soil moisture (SM products can provide spatiotemporally continuous worldwide datasets, they can hardly be employed in characterizing fine-grained regional land surface processes, owing to their coarse spatial resolution. In this study, we proposed a machine-learning-based method to enhance SM spatial accuracy and improve the availability of SM data. Four machine learning algorithms, including classification and regression trees (CART, K-nearest neighbors (KNN, Bayesian (BAYE, and random forests (RF, were implemented to downscale the monthly European Space Agency Climate Change Initiative (ESA CCI SM product from 25-km to 1-km spatial resolution. During the regression, the land surface temperature (including daytime temperature, nighttime temperature, and diurnal fluctuation temperature, normalized difference vegetation index, surface reflections (red band, blue band, NIR band and MIR band, and digital elevation model were taken as explanatory variables to produce fine spatial resolution SM. We chose Northeast China as the study area and acquired corresponding SM data from 2003 to 2012 in unfrozen seasons. The reconstructed SM datasets were validated against in-situ measurements. The results showed that the RF-downscaled results had superior matching performance to both ESA CCI SM and in-situ measurements, and can positively respond to precipitation variation. Additionally, the RF was less affected by parameters, which revealed its robustness. Both CART and KNN ranked second. Compared to KNN, CART had a relatively close correlation with the validation data, but KNN showed preferable precision. Moreover, BAYE ranked last with significantly abnormal regression values.

  19. Relationship between neuronal network architecture and naming performance in temporal lobe epilepsy: A connectome based approach using machine learning.

    Science.gov (United States)

    Munsell, B C; Wu, G; Fridriksson, J; Thayer, K; Mofrad, N; Desisto, N; Shen, D; Bonilha, L

    2017-09-09

    Impaired confrontation naming is a common symptom of temporal lobe epilepsy (TLE). The neurobiological mechanisms underlying this impairment are poorly understood but may indicate a structural disorganization of broadly distributed neuronal networks that support naming ability. Importantly, naming is frequently impaired in other neurological disorders and by contrasting the neuronal structures supporting naming in TLE with other diseases, it will become possible to elucidate the common systems supporting naming. We aimed to evaluate the neuronal networks that support naming in TLE by using a machine learning algorithm intended to predict naming performance in subjects with medication refractory TLE using only the structural brain connectome reconstructed from diffusion tensor imaging. A connectome-based prediction framework was developed using network properties from anatomically defined brain regions across the entire brain, which were used in a multi-task machine learning algorithm followed by support vector regression. Nodal eigenvector centrality, a measure of regional network integration, predicted approximately 60% of the variance in naming. The nodes with the highest regression weight were bilaterally distributed among perilimbic sub-networks involving mainly the medial and lateral temporal lobe regions. In the context of emerging evidence regarding the role of large structural networks that support language processing, our results suggest intact naming relies on the integration of sub-networks, as opposed to being dependent on isolated brain areas. In the case of TLE, these sub-networks may be disproportionately indicative naming processes that are dependent semantic integration from memory and lexical retrieval, as opposed to multi-modal perception or motor speech production. Copyright © 2017. Published by Elsevier Inc.

  20. Impact of Clinical Parameters in the Intrahost Evolution of HIV-1 Subtype B in Pediatric Patients: A Machine Learning Approach

    Science.gov (United States)

    Rojas Sánchez, Patricia; Cobos, Alberto; Navaro, Marisa; Ramos, José Tomas; Pagán, Israel

    2017-01-01

    Abstract Determining the factors modulating the genetic diversity of HIV-1 populations is essential to understand viral evolution. This study analyzes the relative importance of clinical factors in the intrahost HIV-1 subtype B (HIV-1B) evolution and in the fixation of drug resistance mutations (DRM) during longitudinal pediatric HIV-1 infection. We recovered 162 partial HIV-1B pol sequences (from 3 to 24 per patient) from 24 perinatally infected patients from the Madrid Cohort of HIV-1 infected children and adolescents in a time interval ranging from 2.2 to 20.3 years. We applied machine learning classification methods to analyze the relative importance of 28 clinical/epidemiological/virological factors in the HIV-1B evolution to predict HIV-1B genetic diversity (d), nonsynonymous and synonymous mutations (dN, dS) and DRM presence. Most of the 24 HIV-1B infected pediatric patients were Spanish (91.7%), diagnosed before 2000 (83.3%), and all were antiretroviral therapy experienced. They had from 0.3 to 18.8 years of HIV-1 exposure at sampling time. Most sequences presented DRM. The best-predictor variables for HIV-1B evolutionary parameters were the age of HIV-1 diagnosis for d, the age at first antiretroviral treatment for dN and the year of HIV-1 diagnosis for ds. The year of infection (birth year) and year of sampling seemed to be relevant for fixation of both DRM at large and, considering drug families, to protease inhibitors (PI). This study identifies, for the first time using machine learning, the factors affecting more HIV-1B pol evolution and those affecting DRM fixation in HIV-1B infected pediatric patients. PMID:29044435

  1. Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening.

    Science.gov (United States)

    Heikamp, Kathrin; Bajorath, Jürgen

    2013-07-22

    The choice of negative training data for machine learning is a little explored issue in chemoinformatics. In this study, the influence of alternative sets of negative training data and different background databases on support vector machine (SVM) modeling and virtual screening has been investigated. Target-directed SVM models have been derived on the basis of differently composed training sets containing confirmed inactive molecules or randomly selected database compounds as negative training instances. These models were then applied to search background databases consisting of biological screening data or randomly assembled compounds for available hits. Negative training data were found to systematically influence compound recall in virtual screening. In addition, different background databases had a strong influence on the search results. Our findings also indicated that typical benchmark settings lead to an overestimation of SVM-based virtual screening performance compared to search conditions that are more relevant for practical applications.